This document describes how the read.gt3x package can be used to read the newer .gt3x format files (firmware version above 2.5.0) in batches. For a general introduction about the read.gt3x package please see the Getting started vignette.
By batch-loading we mean the loading of a specific time segment from a file. Batch-loading becomes relevant when you face memory problems for your machine (computer). For example, if you want to process a 7 day with 100 Hertz recording on a 8GB memory machine and apply a memory hungry algorithm to it. Processing the data in batches is then the solution.
In line with the Getting started vignette we start by loading the read.gt3x library and obtain the long example file:
To read a batch from this file we need to add two arguments to the
read.gt3x()
function call:
batch_begin
: First second in time relative to start of
raw non-imputed recording to include in this batchbatch_end
: Last second in time relative to start of raw
non-imputed recording to include in this batchThe following example demonstrates how to use this functionality to iteratively read 24 hour batches from the recording we just downloaded.
N = 24 * 3600 # batch size in seconds (24 hours per day x 3600 seconds per hour)
i = 1
while (i > 0) {
try(expr = {
batch_data = read.gt3x(gt3xfile,
batch_begin = 1 + ((i - 1) * N),
batch_end = i * N)
}, silent = TRUE)
if (exists("batch_data")) {
cat("\nbatch #", i, "loaded with", nrow(batch_data),"rows")
# ... <= Insert your memory hungry algorithm
# ... <= Do not forget to store the output of your memory hungry algorithm somewhere
rm(batch_data) # <= remove the batch_data object to free up memory before loading the next batch
} else {
break() # end while loop when no more batches can be extracted
}
i = i + 1
}
When argument batch_begin or batch_end are used argument imputeZeroes is currently automatically set to FALSE. This means that you as user will be responsible for imputing potential time gaps if idle sleep mode was turned on.
batch_begin | batch_end | behaviour |
---|---|---|
not provided | not provided | default behaviour, read.gt3x attempts to read the content of the entire file in memory |
not provided | second (in time) in the recording and higher than batch_begin | batch is read from beginning of the recording because batch_begin is not specified and defaults to the beginning all the way to batch_end |
second (in time) in the recording | second in the recording and higher than batch_begin | batch is read from batch_begin to batch_end |
second in the recording | second in the recording and less than batch_begin | no output and error message “batch_begin is higher than batch_end, please check input arguments” |
second in the recording | value higher than the number of seconds in the recording | batch is read from batch_begin until the end of the recording |
negative value | second in the recording | no output and error message “batch_begin is higher than batch_end, please check input arguments” this is because the code expects a positive value and treats it as an unsigned 32 bit integer. |
The batch-loading functionality is implemented in a C++ function called parseGT3X, which is used by function read.gt3x. Argument batch_begin and batch_end are forwarded by read.gt3x to parseGT3X. In this way the documentation for function read.gt3x is applicable to all read.gt3x users, while this vignette describes the added functionality only relevant to modern .gt3x data owners interested in batch-loading the data.