datautils¶

Functions for loading and slicing data

heartpy.datautils.get_data(filename, delim=', ', column_name='None', encoding=None, ignore_extension=False)[source]¶

load data from file

Function to load data from a .CSV or .MAT file into numpy array. File can be accessed from local disk or url.

Parameters:	filename (string) – absolute or relative path to the file object to read delim (string) – the delimiter used if CSV file passed default : ‘,’ column_name (string) – for CSV files with header: specify column that contains the data for matlab files it specifies the table name that contains the data default : ‘None’ ignore_extension (bool) – if True, extension is not tested, use for example for files where the extention is not .csv or .txt but the data is formatted as if it is. default : False
Returns:	out – array containing the data from the requested column of the specified file
Return type:	1-d numpy array

Examples

As an example, let’s load two example data files included in the package For this we use pkg_resources for automated testing purposes, you don’t need this when using the function.

>>> from pkg_resources import resource_filename
>>> filepath = resource_filename(__name__, 'data/data.csv')

So, assuming your file lives at ‘filepath’, you open it as such:

>>> get_data(filepath)
array([530., 518., 506., ..., 492., 493., 494.])

Files with multiple columns can be opened by specifying the ‘column_name’ where the data resides:

>>> filepath = resource_filename(__name__, 'data/data2.csv')

Again you don’t need the above. It is there for automated testing.

>>> get_data(filepath, column_name='timer')
array([0.00000000e+00, 8.54790319e+00, 1.70958064e+01, ...,
       1.28192904e+05, 1.28201452e+05, 1.28210000e+05])

You can open matlab files in much the same way by specifying the column where the data lives:

>>> filepath = resource_filename(__name__, 'data/data2.mat')

Again you don’t need the above. It is there for automated testing. Open matlab file by specifying the column name as well:

>>> get_data(filepath, column_name='hr')
array([515., 514., 514., ..., 492., 494., 496.])

You can any csv formatted text file no matter the extension if you set ignore_extension to True:

>>> filepath = resource_filename(__name__, 'data/data.log')
>>> get_data(filepath, ignore_extension = True)
array([530., 518., 506., ..., 492., 493., 494.])

You can specify column names in the same way when using ignore_extension

>>> filepath = resource_filename(__name__, 'data/data2.log')
>>> data = get_data(filepath, column_name = 'hr', ignore_extension = True)

heartpy.datautils.get_samplerate_mstimer(timerdata)[source]¶

detemine sample rate based on ms timer

Function to determine sample rate of data from ms-based timer list or array.

Parameters:	timerdata (1d numpy array or list) – sequence containing values of a timer, in ms
Returns:	out – the sample rate as determined from the timer sequence provided
Return type:	float

Examples

first we load a provided example dataset

>>> data, timer = load_exampledata(example = 1)

since it’s a timer that counts miliseconds, we use this function. Let’s also round to three decimals

>>> round(get_samplerate_mstimer(timer), 3)
116.996

of course if another time unit is used, converting it to ms-based should be trivial.

heartpy.datautils.get_samplerate_datetime(datetimedata, timeformat='%H:%M:%S.%f')[source]¶

determine sample rate based on datetime

Function to determine sample rate of data from datetime-based timer list or array.

Parameters:	timerdata (1-d numpy array or list) – sequence containing datetime strings timeformat (string) – the format of the datetime-strings in datetimedata default : ‘%H:%M:%S.f’ (24-hour based time including ms: e.g. 21:43:12.569)
Returns:	out – the sample rate as determined from the timer sequence provided
Return type:	float

Examples

We load the data like before

>>> data, timer = load_exampledata(example = 2)
>>> timer[0]
'2016-11-24 13:58:58.081000'

Note that we need to specify the timeformat used so that datetime understands what it’s working with:

>>> round(get_samplerate_datetime(timer, timeformat = '%Y-%m-%d %H:%M:%S.%f'), 3)
100.42

heartpy.datautils.rolling_mean(data, windowsize, sample_rate)[source]¶

calculates rolling mean

Function to calculate the rolling mean (also: moving average) over the passed data.

Parameters:	data (1-dimensional numpy array or list) – sequence containing data over which rolling mean is to be computed windowsize (int or float) – the window size to use, in seconds calculated as windowsize * sample_rate sample_rate (int or float) – the sample rate of the data set
Returns:	out – sequence containing computed rolling mean
Return type:	1-d numpy array

Examples

>>> data, _ = load_exampledata(example = 1)
>>> rmean = rolling_mean(data, windowsize=0.75, sample_rate=100)
>>> rmean[100:110]
array([514.49333333, 514.49333333, 514.49333333, 514.46666667,
       514.45333333, 514.45333333, 514.45333333, 514.45333333,
       514.48      , 514.52      ])

heartpy.datautils.outliers_iqr_method(hrvalues)[source]¶

removes outliers

Function that removes outliers based on the interquartile range method and substitutes them for the median see: https://en.wikipedia.org/wiki/Interquartile_range

Parameters:	hrvalues (1-d numpy array or list) – sequence of values, from which outliers need to be identified
Returns:	out – [0] cleaned sequence with identified outliers substituted for the median [1] list of indices that have been replaced in the original array or list
Return type:	tuple

Examples

>>> x = [2, 4, 3, 4, 6, 7, 35, 2, 3, 4]
>>> outliers_iqr_method(x)
([2, 4, 3, 4, 6, 7, 4.0, 2, 3, 4], [6])

heartpy.datautils.outliers_modified_z(hrvalues)[source]¶

removes outliers

Function that removes outliers based on the modified Z-score metric and substitutes them for the median

Parameters:	hrvalues (1-d numpy array or list) – sequence of values, from which outliers need to be identified
Returns:	out – [0] cleaned sequence with identified outliers substituted for the median [1] list of indices that have been replaced in the original array or list
Return type:	tuple

Examples

>>> x = [2, 4, 3, 4, 6, 7, 35, 2, 3, 4]
>>> outliers_modified_z(x)
([2, 4, 3, 4, 6, 7, 4.0, 2, 3, 4], [6])

heartpy.datautils.MAD(data)[source]¶

computes median absolute deviation

Function that compute median absolute deviation of data slice See: https://en.wikipedia.org/wiki/Median_absolute_deviation

Parameters:	data (1-dimensional numpy array or list) – sequence containing data over which to compute the MAD
Returns:	out – the Median Absolute Deviation as computed
Return type:	float

Examples

>>> x = [2, 4, 3, 4, 6, 7, 35, 2, 3, 4]
>>> MAD(x)
1.5

heartpy.datautils.load_exampledata(example=0)[source]¶

loads example data

Function to load one of the example datasets included in HeartPy and used in the documentation.

Parameters:	example (int (0, 1, 2)) – selects example data used in docs of three datafiles. Available (see github repo for source of files): 0 : data.csv 1 : data2.csv 2 : data3.csv default : 0
Returns:	out – Contains the data and timer column. If no timer data is available, such as in example 0, an empty second array is returned.
Return type:	tuple of two arrays

Examples

This function can load one of the three example data files provided with HeartPy. It returns both the data and a timer if that is present

For example:

>>> data, _ = load_exampledata(0)
>>> data[0:5]
array([530., 518., 506., 494., 483.])

And another example:

>>> data, timer = load_exampledata(1)
>>> [round(x, 2) for x in timer[0:5]]
[0.0, 8.55, 17.1, 25.64, 34.19]