⬆️ ⬇️

Recognizing motion gestures on Android using Tensorflow

image



Introduction



Today, there are many different ways to interact with smartphones: touch-screen, hardware buttons, fingerprint scanner, video camera (for example, facial recognition system), D-PAD, buttons on the headset, and so on. But what about the use of motion gestures?



For example, quickly moving the phone to the right or left while holding it in your hand can very accurately reflect the intention to move to the next or previous song in the playlist. Or you can quickly turn the phone upside down and then back to update the application content. Implementing such an interaction looks promising and literally adds a new dimension to the UX. This article describes how to implement this using machine learning and the Tensorflow library for Android.



Description



Let's define the ultimate goal. I would like the smartphone to recognize fast movements left and right.

')

I would also like the implementation to be in the Android library and it could be easily integrated into any other application.



Gestures can be recorded on a smartphone using several sensors: accelerometer, gyroscope, magnetometer, and others. Later, a set of recorded gestures can be used in machine learning algorithms for recognition.



A special Android application will be developed for data recording. Preprocessing and training will be done on a PC in Jupyter Notebook using the Python language and the TensorFlow library. Gesture recognition will be implemented in a demonstration application using learning outcomes. In the end, we will develop an Android-ready gesture recognition library that can be easily integrated into other applications.



Our implementation plan:





Implementation



Data preparation



To begin with, let's decide which sensors and what type of data our gestures can describe. It seems that both the accelerometer and the gyroscope must be used to accurately describe these gestures.



The accelerometer obviously measures acceleration and, accordingly, movement:



image



The accelerometer has an interesting nuance - it measures not only the acceleration of the phone itself, but also the acceleration of gravity which is approximately equal to 9.8 m / s 2 . This means that the magnitude of the vector of acceleration lying on the table phone will be equal to 9.8. Such values ​​cannot be used directly and must be subtracted from the value of the gravitational acceleration vector. This is not an easy task because it requires joint processing of the magnetometer and accelerometer data. Fortunately, Android has a special “Linear Accelerometer” sensor that performs the necessary calculations and returns the correct values.



A gyroscope, on the other hand, measures rotation:



image



Let's try to determine which values ​​will correlate with our gestures. Obviously, in an accelerometer (meaning a linear accelerometer), the values ​​of X and Y will describe gestures to a sufficiently large degree. The value of the Z accelerometer is unlikely to depend on our gestures.



As for the gyro sensor, it seems that gestures slightly affect the Z axis. However, to simplify the implementation, I propose not to include it in the calculation. In this case, our gesture detector recognizes the movement of the phone not only in the hand, but also along the horizontal line - for example, on the table. But this is not too big a problem.



Thus, we need to develop an Android application that can record accelerometer data.



I developed such an application . Here is a screenshot of the recorded right gesture:



image



As you can see, the X and Y axes react very strongly to the gesture. The Z axis also reacts, but, as we decided, it will not be included in the processing.



Here is the “left” gesture:



image



Notice that the X values ​​are almost the opposite of the values ​​from the previous gesture.



Another thing to mention is the sampling rate of the data. It reflects how often the data is updated and directly affects the amount of data during a time interval.



Another thing to consider is the duration of the gestures. This value, like many others, should be chosen empirically. I found that the duration of gestures lasts no more than 1 second, but to make the value more suitable for calculations, I rounded it to 1.28 seconds.



The selected refresh rate is 128 points for 1.28 seconds, which gives a delay of 10 milliseconds (1.28 / 128). This value must be passed to the registerListener method.



The idea is to train the neural network to recognize such signals in the data stream from the accelerometer.



So, next we need to write a lot of gesture samples to the files. Of course, the same type of gestures (right or left) must be marked with the same tag. It is difficult to say in advance how many samples are necessary for network training, but this can be determined as a result of training.



Tapn somewhere on the graph, you highlight the sample - i.e. plot of 128 pixels in length:



image



Now the “save” button will be active. Clicking on it will automatically save the sample in a file in the working directory to a file with the name of the form “{label} _ {timestamp} .log”. The working directory can be selected in the application menu.



Also note that after saving the current sample, the next one will be automatically selected. The next gesture is selected using a very simple algorithm: find the first entry an absolute value of X which is greater than 3, then rewind 20 points.



This automation allows us to quickly save a lot of samples. I recorded 500 samples per gesture. Saved data must be copied to a PC for further processing. (Processing and learning directly on the phone looks interesting, but TensorFlow for Android currently does not support learning).



In the picture presented earlier, the data range is approximately ± 6. However, if you wave your phone stronger, it can reach ± ​​10. It is better to normalize the data so that the range is ± 1, which is much better suited to the format of the neural network data. To do this, I simply divided all data into a constant - in my case 9.



The next step that needs to be done before learning begins is filtering the data to eliminate high-frequency vibrations. Such fluctuations are not related to our gestures.



There are many ways to filter data. One of them is the Moving Average filter. Here is an example of how it works:



image



Note that the maximum X values ​​of the data are now half the original. Since we will perform the same data filtering in real time during recognition, this should not be a problem.



The final step to improve learning is data augmentation. This process extends the original data set by performing some manipulations. In our case, I just moved the data left and right to a few points:



image



Neural Network Design



Designing a neural network is not an easy task and requires some experience and intuition. On the other hand, neural networks are well studied for some types of tasks, and you can simply adapt an existing network. Our task is very similar to the task of classifying images; the input can be viewed as an image with a height of 1 pixel (and this is the case - the first operation converts the input two-dimensional data [128 columns x 2 channels] into three-dimensional data [1 row x 128 columns x 2 channels]).



Thus, the input of the neural network is an array [128, 2].



The output of the neural network is a vector with a length equal to the number of labels. In our case, these are 2 numbers with a floating double-precision data type.



Below is a diagram of the neural network:



image



And the detailed scheme obtained in TensorBoard:



image



This diagram contains some auxiliary nodes that are necessary only for training. Later I will show a clean, optimized picture.



Training



Training will be conducted on a PC in a Jupyter Notebook environment using Python and the TensorFlow library. You can run Notebook in Conda using the following configuration file . Here are some hyper learning options:



: Adam   : 3   (learning rate): 0.0001 


The data set is divided into training and verification in the ratio of 7 to 3.



The quality of training can be monitored using the values ​​of accuracy of training and testing. Learning accuracy should approach, but not reach 1. A value too low will indicate poor and inaccurate recognition, and a too high value will result in retraining the model and may lead to some artifacts during recognition, for example, recognition with a non-zero value for data without gestures. Good testing accuracy is proof that a trained model can recognize data that it has never seen before.



Learning Protocol:



 ('Epoch: ', 0, ' Training Loss: ', 0.054878365, ' Training Accuracy: ', 0.99829739) ('Epoch: ', 1, ' Training Loss: ', 0.0045060506, ' Training Accuracy: ', 0.99971622) ('Epoch: ', 2, ' Training Loss: ', 0.00088313385, ' Training Accuracy: ', 0.99981081) ('Testing Accuracy:', 0.99954832) 


The TensorFlow graph and its associated data can be saved to files using the following methods:



 saver = tf.train.Saver() with tf.Session() as session: session.run(tf.global_variables_initializer()) # save the graph tf.train.write_graph(session.graph_def, '.', 'session.pb', False) for epoch in range(training_epochs): # train saver.save(session, './session.ckpt') 


The full code can be found here .



Neural Network Export



How to save TensorFlow data was shown in the previous section. The graph is stored in the session.pb file, and the training data (weights, etc.) are saved in several “session.ckpt” files. These files can be quite large:



 session.ckpt.data-00000-of-00001 3385232 session.ckpt.index 895 session.ckpt.meta 65920 session.pb 47732 


The graph and learning data can be frozen and converted into one file suitable for working on a mobile device.



To freeze it, copy the tensorflow / python / tools / freeze_graph.py file to the script directory and run the following command:



 python freeze_graph.py --input_graph=session.pb \ --input_binary=True \ --input_checkpoint=session.ckpt \ --output_graph=frozen.pb \ --output_node_names=labels_output 


where output_graph is the output file and output_node_names is the name of the output node. This value is specified in the Python code.



The resulting file is smaller than the previous ones, but still large enough:



frozen.pb 1130835





Here is what this model looks like in TensorBoard:



image



To obtain such an image, copy the tensorflow / python / tools / import_pb_to_tensorboard.py file to the script directory and run:



 python import_pb_to_tensorboard.py --model_dir=frozen.pb --log_dir=tmp 


where frozen.pb is the model file.



Now run TensorBoard:



 tensorboard --logdir=tmp 


There are several ways to optimize the model for the mobile environment. To run the following commands, you need to compile TensorFlow from sources :



1. Remove unused nodes and general optimization. Run:



 bazel build tensorflow/tools/graph_transforms:transform_graph bazel-bin/tensorflow/tools/graph_transforms/transform_graph --in_graph=mydata/frozen.pb --out_graph=mydata/frozen_optimized.pb --inputs='x_input' --outputs='labels_output' --transforms='strip_unused_nodes(type=float, shape="128,2") remove_nodes(op=Identity, op=CheckNumerics) round_weights(num_steps=256) fold_constants(ignore_errors=true) fold_batch_norms fold_old_batch_norms' 


Result:



image



2. Quantization (converting a floating-point data format into an 8-bit integer format). Run:



 bazel-bin/tensorflow/tools/graph_transforms/transform_graph --in_graph=mydata/frozen_optimized.pb --out_graph=mydata/frozen_optimized_quant.pb --inputs='x_input' --outputs='labels_output' --transforms='quantize_weights strip_unused_nodes' 


As a result, the output file is 287,129 bytes in size compared to the original 3.5 MB. This file can be used in TensorFlw for Android.



Android Demo Application



To perform signal recognition in the Android application, you need to connect the TensorFlow for Android library to the project. Add a library to the gradle dependencies:



 dependencies { implementation 'org.tensorflow:tensorflow-android:1.4.0' } 


Now you can access the TensorFlow API through the TensorFlowInferenceInterface class. First, put the file "frozen_optimized_quant.pb" in the "assets" directory of your application (i.e. "app / src / main / assets") and load it in the code (for example, when starting the Activity, however, as usual, it’s better to produce any I / O operations in the background thread):



 inferenceInterface = new TensorFlowInferenceInterface(getAssets(), “file:///android_asset/frozen_optimized_quant.pb”); 


Notice how the model file is specified.



Finally, recognition can be performed:



 float[] data = new float[128 * 2]; String[] labels = new String[]{"Right", "Left"}; float[] outputScores = new float[labels.length]; // populate data array with accelerometer data inferenceInterface.feed("x_input", data, new long[] {1, 128, 2}); inferenceInterface.run(new String[]{“labels_output”}); inferenceInterface.fetch("labels_output", outputScores); 


The data is fed to the input of our “black box” in the form of a one-dimensional array containing serial data X and Y of the accelerometer, that is, the data format [x1, y1, x2, y2, x3, y3, ..., x128, y128].



At the output, we have two floating-point numbers in the range 0 ... 1, the values ​​of which are the correspondence of the input data to the “left” or “right” gestures. Note that the sum of these values ​​is 1. Thus, for example, if the input signal does not match either the left or the right gesture, then the output will be close to [0.5, 0.5]. For simplicity, it is better to convert these values ​​to absolute values ​​0 ... 1, using simple mathematics.



In addition, do not forget to perform filtering and normalization of data before running recognition.



Here is a screenshot of the testing window of the demo application:



image



where the red and green lines represent the pre-processed signal in real time. Yellow and blue lines refer to “corrected” “right” and “left” gestures, respectively. “Time” is the processing time of one sample, and it is rather low, which allows recognition in real time (two milliseconds means that processing can be performed at a speed of 500 Hz, we set up an accelerometer to update at 100 Hz).



As you can see, there are some nuances. First, there are some nonzero recognition values, even for an “empty” signal. Secondly, each gesture has a long-lasting “true” recognition in the center with a value close to 1.0, and a slight opposite recognition at the edges.



It seems that additional processing is required to perform accurate, actual gesture recognition.



Android library



I implemented recognition using TensorFlow along with additional processing of the output in a separate library for Android. Library and demo application are here .



To use the library in your application, add a dependency on the library to the gradle file:



 repositories { maven { url "https://dl.bintray.com/rii/maven/" } } dependencies { ... implementation 'uk.co.lemberg:motiondetectionlib:1.0.0' } 


create a MotionDetector listener:



 private final MotionDetector.Listener gestureListener = new MotionDetector.Listener() { @Override public void onGestureRecognized(MotionDetector.GestureType gestureType) { Log.d(TAG, "Gesture detected: " + gestureType); } }; 


and enable recognition:



 MotionDetector motionDetector = new MotionDetector(context, gestureListener); motionDetector.start(); 


Conclusion



We have gone through all the stages of developing and implementing motion gesture recognition in an Android application using the TensorFlow library: collecting and pre-processing data, developing and training a neural network, and developing a test application and a ready-to-use library for Android. The described approach can be used for any other recognition or classification tasks. The resulting library can be integrated into any other Android application to make it possible to control it using motion gestures.



I hope you found this article useful, you can also watch the video review below.

If you have a project idea, but do not know where to start, we are always here to help you .





PS In fact, I am also the author of the original English version of the article, which was published on blog.lemberg.co.uk , so I can answer technical questions.

Source: https://habr.com/ru/post/346766/



All Articles