datautils¶
Functions for loading and slicing data
-
heartpy.datautils.
get_data
(filename, delim=', ', column_name='None', encoding=None, ignore_extension=False)[source]¶ load data from file
Function to load data from a .CSV or .MAT file into numpy array. File can be accessed from local disk or url.
Parameters: - filename (string) – absolute or relative path to the file object to read
- delim (string) – the delimiter used if CSV file passed default : ‘,’
- column_name (string) – for CSV files with header: specify column that contains the data for matlab files it specifies the table name that contains the data default : ‘None’
- ignore_extension (bool) – if True, extension is not tested, use for example for files where the extention is not .csv or .txt but the data is formatted as if it is. default : False
Returns: out – array containing the data from the requested column of the specified file
Return type: 1-d numpy array
Examples
As an example, let’s load two example data files included in the package For this we use pkg_resources for automated testing purposes, you don’t need this when using the function.
>>> from pkg_resources import resource_filename >>> filepath = resource_filename(__name__, 'data/data.csv')
So, assuming your file lives at ‘filepath’, you open it as such:
>>> get_data(filepath) array([530., 518., 506., ..., 492., 493., 494.])
Files with multiple columns can be opened by specifying the ‘column_name’ where the data resides:
>>> filepath = resource_filename(__name__, 'data/data2.csv')
Again you don’t need the above. It is there for automated testing.
>>> get_data(filepath, column_name='timer') array([0.00000000e+00, 8.54790319e+00, 1.70958064e+01, ..., 1.28192904e+05, 1.28201452e+05, 1.28210000e+05])
You can open matlab files in much the same way by specifying the column where the data lives:
>>> filepath = resource_filename(__name__, 'data/data2.mat')
Again you don’t need the above. It is there for automated testing. Open matlab file by specifying the column name as well:
>>> get_data(filepath, column_name='hr') array([515., 514., 514., ..., 492., 494., 496.])
You can any csv formatted text file no matter the extension if you set ignore_extension to True:
>>> filepath = resource_filename(__name__, 'data/data.log') >>> get_data(filepath, ignore_extension = True) array([530., 518., 506., ..., 492., 493., 494.])
You can specify column names in the same way when using ignore_extension
>>> filepath = resource_filename(__name__, 'data/data2.log') >>> data = get_data(filepath, column_name = 'hr', ignore_extension = True)
-
heartpy.datautils.
get_samplerate_mstimer
(timerdata)[source]¶ detemine sample rate based on ms timer
Function to determine sample rate of data from ms-based timer list or array.
Parameters: timerdata (1d numpy array or list) – sequence containing values of a timer, in ms Returns: out – the sample rate as determined from the timer sequence provided Return type: float Examples
first we load a provided example dataset
>>> data, timer = load_exampledata(example = 1)
since it’s a timer that counts miliseconds, we use this function. Let’s also round to three decimals
>>> round(get_samplerate_mstimer(timer), 3) 116.996
of course if another time unit is used, converting it to ms-based should be trivial.
-
heartpy.datautils.
get_samplerate_datetime
(datetimedata, timeformat='%H:%M:%S.%f')[source]¶ determine sample rate based on datetime
Function to determine sample rate of data from datetime-based timer list or array.
Parameters: - timerdata (1-d numpy array or list) – sequence containing datetime strings
- timeformat (string) – the format of the datetime-strings in datetimedata default : ‘%H:%M:%S.f’ (24-hour based time including ms: e.g. 21:43:12.569)
Returns: out – the sample rate as determined from the timer sequence provided
Return type: float
Examples
We load the data like before
>>> data, timer = load_exampledata(example = 2) >>> timer[0] '2016-11-24 13:58:58.081000'
Note that we need to specify the timeformat used so that datetime understands what it’s working with:
>>> round(get_samplerate_datetime(timer, timeformat = '%Y-%m-%d %H:%M:%S.%f'), 3) 100.42
-
heartpy.datautils.
rolling_mean
(data, windowsize, sample_rate)[source]¶ calculates rolling mean
Function to calculate the rolling mean (also: moving average) over the passed data.
Parameters: - data (1-dimensional numpy array or list) – sequence containing data over which rolling mean is to be computed
- windowsize (int or float) – the window size to use, in seconds calculated as windowsize * sample_rate
- sample_rate (int or float) – the sample rate of the data set
Returns: out – sequence containing computed rolling mean
Return type: 1-d numpy array
Examples
>>> data, _ = load_exampledata(example = 1) >>> rmean = rolling_mean(data, windowsize=0.75, sample_rate=100) >>> rmean[100:110] array([514.49333333, 514.49333333, 514.49333333, 514.46666667, 514.45333333, 514.45333333, 514.45333333, 514.45333333, 514.48 , 514.52 ])
-
heartpy.datautils.
outliers_iqr_method
(hrvalues)[source]¶ removes outliers
Function that removes outliers based on the interquartile range method and substitutes them for the median see: https://en.wikipedia.org/wiki/Interquartile_range
Parameters: hrvalues (1-d numpy array or list) – sequence of values, from which outliers need to be identified Returns: out – [0] cleaned sequence with identified outliers substituted for the median [1] list of indices that have been replaced in the original array or list Return type: tuple Examples
>>> x = [2, 4, 3, 4, 6, 7, 35, 2, 3, 4] >>> outliers_iqr_method(x) ([2, 4, 3, 4, 6, 7, 4.0, 2, 3, 4], [6])
-
heartpy.datautils.
outliers_modified_z
(hrvalues)[source]¶ removes outliers
Function that removes outliers based on the modified Z-score metric and substitutes them for the median
Parameters: hrvalues (1-d numpy array or list) – sequence of values, from which outliers need to be identified Returns: out – [0] cleaned sequence with identified outliers substituted for the median [1] list of indices that have been replaced in the original array or list Return type: tuple Examples
>>> x = [2, 4, 3, 4, 6, 7, 35, 2, 3, 4] >>> outliers_modified_z(x) ([2, 4, 3, 4, 6, 7, 4.0, 2, 3, 4], [6])
-
heartpy.datautils.
MAD
(data)[source]¶ computes median absolute deviation
Function that compute median absolute deviation of data slice See: https://en.wikipedia.org/wiki/Median_absolute_deviation
Parameters: data (1-dimensional numpy array or list) – sequence containing data over which to compute the MAD Returns: out – the Median Absolute Deviation as computed Return type: float Examples
>>> x = [2, 4, 3, 4, 6, 7, 35, 2, 3, 4] >>> MAD(x) 1.5
-
heartpy.datautils.
load_exampledata
(example=0)[source]¶ loads example data
Function to load one of the example datasets included in HeartPy and used in the documentation.
Parameters: example (int (0, 1, 2)) – selects example data used in docs of three datafiles. Available (see github repo for source of files): 0 : data.csv 1 : data2.csv 2 : data3.csv default : 0 Returns: out – Contains the data and timer column. If no timer data is available, such as in example 0, an empty second array is returned. Return type: tuple of two arrays Examples
This function can load one of the three example data files provided with HeartPy. It returns both the data and a timer if that is present
For example:
>>> data, _ = load_exampledata(0) >>> data[0:5] array([530., 518., 506., 494., 483.])
And another example:
>>> data, timer = load_exampledata(1) >>> [round(x, 2) for x in timer[0:5]] [0.0, 8.55, 17.1, 25.64, 34.19]