Activity tracking using the phone and MATLAB

Are you a physically active person? Do you count time spent walking or running? Recent studies have shown that nearly 20% of adults use some technology to track physical activity. Do you own this 20% and analyze your daily activities to get to know yourself better?

This post will tell you how to use an Android device paired with MATLAB machine learning algorithms and Statistics and Machine Learning Toolbox to track physical activity in real time.

Installation and Setup

We will need:
')

a computer with a MATLAB R2014b or older, installed the Android Sensors support package, and the Statistics and Machine Learning Toolbox;
Android smartphone with the MATLAB Mobile application;
Internet connection on the smartphone.

Communication between the smartphone and MATLAB is done using the MATLAB Connector, the plug-in can be downloaded from the link .
More information about collecting data from smartphones sensors here .

Introduction

Watching someone makes it easy for us to determine what a person is doing, even if we see him for the first time. We are able to recognize human activity by default. The brain compares the observed actions with thousands of previously seen and gives the desired match. Similarly, a computer (telephone) recognizes those actions that it has been taught.

Using machine learning algorithms, you can teach a computer to recognize specific human activities, and improve this skill as new data becomes available. Such recognition, which includes splitting data into separate “classes,” is called classification. Another example of classification is the diagnosis of a patient based on the presence of certain symptoms.

The classification algorithm for such tasks is applied in two stages: training and detection. At the training stage, a model is built that sorts the training data into specific categories. During the discovery phase, new data is bound to already existing categories.

In the developed application, the acceleration sensor (accelerometer) of the phone was used, which was used to determine the type of activity. A K-Nearest Neighbor (KNN) classifier has been selected. This is a convenient algorithm for such an application, since it quickly detects motion and is very accurate in working with low-dimensional data (a small set of properties). Based on the majority of the values of his K-nearest neighbors in the training data set, he discovers the category to which the point in the new data belongs.

The process of recognition of movements occurred in three stages:

data collection: acceleration values in three planes were collected from the accelerometer of the Android device;
extraction of properties: for each monitored type of activity, the distinctive properties were extracted and identified from the accelerometer readings;
classification of activity: to teach the classifier, properties derived for various activities were used. Then the classifier was used on the new accelerometer readings to determine what kind of movements occurred.

Data collection

Since the detection was carried out by the classifier, for further work he had to learn on a set of previously known data points. MATLAB Mobile, in combination with the MATLAB Support Package for Android Sensors (MATLAB Support Package for Android Sensors), allows you to collect device accelerometer data and send measurements to a PC in a MATLAB session.

After establishing the connection with Android, I created an instance of the “mobiledev” object to record data from the sensors of the device. Then the accelerometer on the device was turned on, and MATLAB began collecting data.

mobileSensor = mobiledev() % create mobiledev object mobileSensor.AccelerationSensorEnabled = 1; % enable accelerometer mobileSensor.start; % start sending data

After the last command, I did not move for 10 seconds. Then he got up and walked for the next 70 seconds. Then he went to the stairs, went downstairs and ran for about 60 seconds. After another 70 seconds walked, then climbed the stairs and returned to the office. In conclusion, I sat down and did not move. From the accelerometer I received three-dimensional recorded data and visualized it.

 [accel, time] = accellog(mobileSensor); % acquire data from logs plot(time, accel); % plot data

The graph below contains data on all actions: immobility, walking, running, climbing and descending from the stairs. As you can see, they can not be detected simply by looking at the graph. Therefore, it was necessary to identify and identify those properties that will help recognize each action and distinguish it from others.

Retrieving Properties

Although in the time domain the “raw” accelerometer readings look the same for each type of activity, in fact they contain unique characteristics that can be used to identify different types of movements. This can be, for example, the maximum value of all data points or the number of data points out of any range. Such characteristics are called properties.

Thanks to these properties, you can make a classification. You can consider many properties:

average value,
median,
dispersion,
maximum,
minimum,
the magnitude of the frequency component, and so on.

However, in order to effectively extract them, it is necessary to find such a minimum set of properties that will allow to distinguish different activities and at the same time will not be too resource-intensive.

Of all the possible properties, the next six turned out to be the most suitable. They are labeled "Feature_N":

Feature_1: Average value of data.
Feature_2: The square of the sum of the magnitude of the data is below the 25th percentile .
Feature_3: The squared sum of the magnitude of the data is below the 25th percentile.
Feature_4: The peak of the frequency in the data spectrum on the y-axis is below 5 Hz.
Feature_5: The number of peaks in the data spectrum along the Y axis is below 5 Hz.
Feature_6: The integral of the data spectrum along the Y axis from 0 to 5 Hz.

The data size is the square root of the sum of squares of the acceleration readings along the Y axis and the Z axis. The readings along the X axis can be ignored because they do not differ much for different types of activity - this is due to the position of the phone in your pocket. From the 5-second sections of the record of each activity, the corresponding six properties were extracted. The measurement window of 5 seconds is selected because of the length sufficient to record consistent and stable properties. The length of the window can be varied, remembering that in too long windows (about a minute or more) there is a high probability of error due to a possible change of activity.

Having calculated the above properties for each type of activity as part of the learning procedure, the algorithm was given two inputs: properties and the corresponding answer (that is, the activity to which these properties belong). The variable function is calculated on the basis of “raw” three-dimensional accelerometer measurements while running and determines the size of the above-mentioned properties. For example:

 feature = [30, 15, 7.6, 2.3, 5, 8]; activity = 'running';

Here, the variable function looks like: [Feature_1, Feature_2, Feature_3, Feature_4, Feature_5, Feature_6]. Each property-activity pair represents 1 point of training data. You can learn more about how to get these training data points through the MATLAB script recordTrainingData in the attached archive. The initial accelerometer readings for each type of activity are stored in the MAT file. To avoid data collection during a change of activity, input requests have been added between each activity. This ensures that the source data for highlighting the properties of each activity is clean and consistent.

To extract properties from source data stored in each MAT file, use extractTrainingFeature from the downloaded archive. Please note that the initial accelerometer data is sampled at a frequency of about 200 Hz, but the frequency may be significantly less due to the peculiarities of Android operation. In addition, the sampling rate may change during the measurement, thereby making the data unevenly discrete.

To bring the data to a uniform sampling rate, a resampling algorithm has been developed. The algorithm helped to improve the identification of properties and further classification.

In the graph below, a red line with a 'x' indicates non-uniformly discrete source data of the Y-axis of the accelerometer, and a blue line with a 'o' indicates the processed data. Note that the first three properties from the list above are in the time domain, while the rest are in the frequency domain.

The following graphs show that the properties of various activities are grouped together (for example, all the red dots on the graph correspond to the properties associated with running). This nature of the properties (different clusters for different types of activities) allows you to accurately determine the type of activity performed for new accelerometer readings:

Activity classification

Typically, a learning algorithm requires many learning data points to build a reliable recognition model. So, to train the classifier, we collected more than a thousand data points of each activity. To begin with, the properties are grouped into an array in the following order - walking, running, resting position, climbing up and down:

 data = [featureWalk; featureRun; featureIdle; featureUp; featureDown];

In the code above, featureWalk is an array of six properties with a dimension of 1000 × 6. Properties are calculated on the initial accelerometer data collected during walking. Similarly, featureRun is an array of 1000 × 6 of six properties, calculated on the raw accelerometer data collected during the run, and so on.

When the properties for all types of activities were collected in a single data file, it turned out that the properties of running are larger than the properties of walking or immobility. Because of this, the properties of various types of activity are shifted and affect the ability of the algorithm to accurately determine the type of activity on new data (which can be scaled differently). Therefore, it is necessary to normalize the data to limit the range of their values from 0 to 1:

The graph above illustrates the original values of Feature_1, calculated for all activities, as well as normalized values. As you can see, after normalization, the values of Feature_1 are between [0, 1]. Similarly, the data of the other five properties were normalized.

After normalization and preparation of data for use, it is necessary to determine how the machine learning algorithm responds to the input data array. The input data and the response to them will then train the algorithm to classify new data.
To build an output response vector, first, whole numbers are assigned for each type of activity: -1, 0, 1, 2, 3 - descend, rest, climb stairs, walk and run, respectively. Since a response is needed for each set of input properties, a column vector (containing these integers) is created of the same length as the response vector. So that the detected activity is easily read, the response vector is converted into an array of categories with the values of “Going Downstairs”, “Idling”, “Climbing Upstairs”, “Walking”, “Running” and “Transition”:

 Down = -1 * ones(length(featureDown), 1); Idle = zeros(length(featureIdle), 1); Up = ones(length(featureUp), 1); Walk = 2 * ones(length(featureWalk), 1); Run = 3 * ones(length(featureRun), 1); responseVector = [Walk; Run; Idle; Up; Down]; % building the output response vector valueset = [-1:3, -10]; cateName = {'Going downstairs', 'Idling', 'Climbing upstairs', 'Walking', ... 'Running', 'Transition'}; response = categorical(responseVector, valueset, ... cateName); % converting to a categorical array

By generating an array of responses above, the K-NN algorithm was trained to create models. To do this, use the FITCKNN function from the Statistics and Machine Learning Toolbox; after several attempts for the application, the value of the property 'NumNeighbors' K equal to 30 was chosen, so it provided the necessary performance and detection accuracy. Having obtained the model with the help of training data, it must be confirmed on the new data coming from the smartphone.

 mdl = fitcknn(data, response); mdl.NumNeighbors = 30;

To do this, the extractFeatures function was created to calculate the six properties required for classification. The calculated properties (saved in the new Feature variable) are then used together with the model to recognize activity: the model is trained to distinguish 5 of its types. A natural question arises: what to do if the detected activity is different from them?

 newFeature = [0.15, 0.28, 0.2, 0.35, 0.65, 0.7]; % features for the new activity result = predict(mdl,newFeature); % predicting the activity

The detector will continue to classify activity in the current detection window and assign small probabilities for each of the 5 categories. The same small probabilities can arise when moving from one type of activity to another within one detection window - this will complicate the classification. In order to avoid such incorrect detection, the predictor is implemented in such a way as to produce a “Transition” in with a prediction probability less than 95%.

The same rule applies when moving from one classified activity to another, for example, from walking (“Walking”) to running (“Running”). This is reasonable, because on successive windows, the detection of property values in such a transition may be unstable.

Below is a graph of the original data collected from the phone within a minute, and recognized actions. For convenience, the line of recognized activity is below and marked as:

x - Walking, * - Running (Running), o - resting states (Idling), ˄ - Walking Upstairs, ˅ - Walking Downstairs, • - Transition:

Note that for learning, the machine learning algorithm uses the initial accelerometer data for the various actions performed. If you use the learning algorithm with ready data from the archive (see the link at the end of the article), then the detection algorithm for the data collected from your phone will most likely not be accurate. Even if you put the phone, like the author, in the right front pocket of the pants, the algorithm will still not be accurate. This happens because of the difference in gaits, the dependence of measurements on the height and weight of a particular person.

To create an activity detector that accurately identifies your actions, you need to start by collecting several sets of phone accelerometer data for those activities that you want to recognize. Then, for each dataset, extract the 6 properties listed above using the extractTrainingFeatures training. After using the extracted properties to train the machine learning algorithm. Now you can use the resulting algorithm to recognize your new actions.

In the post, only one of the many applications of the application obtained is considered. Such an application can be applied to any other recognition system, for example, for a mobile robot. You can supplement the application with data from GPS sensors, gyroscopes, magnetometers, to get a more functional tracker.

Sample Code

Source: https://habr.com/ru/post/307038/

All Articles