Batch loading a gt3x file

This document describes how the read.gt3x package can be used to read the newer .gt3x format files (firmware version above 2.5.0) in batches. For a general introduction about the read.gt3x package please see the Getting started vignette.

Purpose of batch-loading

By batch-loading we mean the loading of a specific time segment from a file. Batch-loading becomes relevant when you face memory problems for your machine (computer). For example, if you want to process a 7 day with 100 Hertz recording on a 8GB memory machine and apply a memory hungry algorithm to it. Processing the data in batches is then the solution.

How to use batch-loading?

In line with the Getting started vignette we start by loading the read.gt3x library and obtain the long example file:

library(read.gt3x)
gt3xfile <- gt3x_datapath(1) # this downloads a long recording

To read a batch from this file we need to add two arguments to the read.gt3x() function call:

  • batch_begin: First second in time relative to start of raw non-imputed recording to include in this batch
  • batch_end: Last second in time relative to start of raw non-imputed recording to include in this batch

The following example demonstrates how to use this functionality to iteratively read 24 hour batches from the recording we just downloaded.

  N = 24 * 3600 # batch size in seconds (24 hours per day x 3600 seconds per hour)
  i = 1
  while (i > 0) {
    try(expr = {
      batch_data = read.gt3x(gt3xfile,
                             batch_begin = 1 + ((i - 1) * N),
                             batch_end = i * N)
    }, silent = TRUE)
    if (exists("batch_data")) {
      cat("\nbatch #", i, "loaded with", nrow(batch_data),"rows")
      # ... <= Insert your memory hungry algorithm
      # ... <= Do not forget to store the output of your memory hungry algorithm somewhere
      
      rm(batch_data) # <= remove the batch_data object to free up memory before loading the next batch
    } else {
      break() # end while loop when no more batches can be extracted
    }
    i = i + 1
  }

Some notes on expected behaviour

When argument batch_begin or batch_end are used argument imputeZeroes is currently automatically set to FALSE. This means that you as user will be responsible for imputing potential time gaps if idle sleep mode was turned on.

batch_begin batch_end behaviour
not provided not provided default behaviour, read.gt3x attempts to read the content of the entire file in memory
not provided second (in time) in the recording and higher than batch_begin batch is read from beginning of the recording because batch_begin is not specified and defaults to the beginning all the way to batch_end
second (in time) in the recording second in the recording and higher than batch_begin batch is read from batch_begin to batch_end
second in the recording second in the recording and less than batch_begin no output and error message “batch_begin is higher than batch_end, please check input arguments”
second in the recording value higher than the number of seconds in the recording batch is read from batch_begin until the end of the recording
negative value second in the recording no output and error message “batch_begin is higher than batch_end, please check input arguments” this is because the code expects a positive value and treats it as an unsigned 32 bit integer.

Where in the code can I find the batch-loading implementation?

The batch-loading functionality is implemented in a C++ function called parseGT3X, which is used by function read.gt3x. Argument batch_begin and batch_end are forwarded by read.gt3x to parseGT3X. In this way the documentation for function read.gt3x is applicable to all read.gt3x users, while this vignette describes the added functionality only relevant to modern .gt3x data owners interested in batch-loading the data.