📜 ⬆️ ⬇️

Autopilot on your own: sensor fusion from the phone and open training data

Hello. We continue to collect autopilot on computer vision from githubs and sticks ( start here ). Today we are connecting to the motion sensors of the smartphone (accelerometer, gyroscope and GPS receiver) on Android, we are developing an uncomplicated sensor fusion and finally closing it with the code to collect the training sample. Bonuses - Android application for recording all sensors synchronized with video and more than an hour of tagged data as part of import substitution of foreign contests . All code is still on github .



These are three-axis MEMS accelerometer and gyroscope, they will be extremely useful to us.


In a nutshell, I’ll remind you (for the details, welcome to the previous post that the ultimate goal is to train a system that will be able to give out control actions for the car (video steering angle and the desired speed or acceleration) by video frame from the windshield.) video from the races, where the machine is controlled by a man, and time-synchronized information about the angle of rotation and speed.In modern cars, this data can be read through the CAN bus , but first you need to buy and connect a special CAN adapter ter, and secondly, to deal with the decoding of the protocol and data format, which are different for different manufacturers. Instead of all this trouble, we calculate the control actions indirectly - from raw data from an ordinary smartphone, which is not connected to anything else.


In the last series, we learned how to determine the angular speed of a car turning in a horizontal plane based on only video frames using the video-SLAM library . Unfortunately, in practice this library very inaccurately calculates the translational speed of the device. And without knowing the translational speed, we cannot train even the steering component for the autopilot separately, since the angular velocity \ omega depends on the combination of the degree of rotation of the steering wheel (equivalently, the turning radius r ) and the forward speed of the car v .



To get a forward speed, today we are switching from video processing to motion sensors, which almost any smartphone has: a GPS receiver, an accelerometer and a gyroscope. Let's look at the pros and cons of each, combine the information and get the instantaneous speed for each frame of video. Moreover, we will also take information about the rotations from the gyroscope, which will make it possible to abandon the more capricious optical SLAM and the calibration of the smartphone camera that it needs. As a result, the new process of collecting data dries up to three "button clicks": we put the application on the phone - we record the track on the race - we process it with one command on the computer.


As a result, we obtain a frame-by-frame annotation with the translational speed and the speed of turns:



Available motion sensors


We will work with android. Whatever the platform - for those who know English, an interesting video about how information from different sensors can complement calibration and reduce errors. We are interested in motion sensors (accelerometer and gyroscope) and location (GPS) sensors .


From the point of view of the API, everything is simple - we ask the user for permissions, subscribe for updates, and get a stream of events with a sensor reading and measurement time. Details can be found in the source .


Consider the pros and cons of sensors.


GPS


GPS is, as it turned out, 90% success in determining translational speed, at least in relatively open spaces. From GPS come in the Location format:



A huge plus of GPS data is that the measurement errors are independent , that is, they do not accumulate over time: the new measurement is not affected by how many measurements there were before that and the large errors. This is a very important feature on which the rest of the approach relies.


The disadvantage is that measurements are rarely taken , about 1 time per second, and there is no sense to carry them out much more often - the movement of the device would still be comparable to a measurement error. As a result, firstly, GPS data is late in reacting to a change in speed, and secondly, it is more difficult to remove noise from them — to do this, you need to look at the measurements that are next in time, which aggravates the time vagueness even more. Here is an example of a speed chart with GPS:



Here, the first 25 seconds (constant acceleration and probable delay of GPS measurements) and noise between 30 and 40 seconds are suspicious. For the seed, the same graph after processing data from the accelerometer and gyroscope:



As you can see, there are improvements in both indicators: we react to accelerations earlier, and the outburst at 35 seconds is smoothed out.


Inertial sensors: accelerometer and gyroscope


Inertial sensors measure acceleration, linear (accelerometer) and centripetal (gyroscope). The result of their measurements is an estimate of the change in the movement of the device relative to the previous point in time. Mathematically, this is reflected in the use of a device-related coordinate system :



The accelerometer provides linear accelerations along the three axes of the device, the gyroscope - the angular velocity around the same three axes.


Plus inertial sensors - the possibility of very frequent measurements , at 400 Hz and above, which by an order of magnitude overlaps the frequency of video frames at ~ 30 Hz. The main disadvantage is the lack of communication with a fixed coordinate system. The sensors are capable of measuring only the change relative to the previous position, therefore, to calculate the results in a fixed coordinate system, their readings need to be integrated over time, and the measurement errors accumulate - the longer the measurement period, the greater the total error at the next time point.


As you can see, GPS and inertial sensors have mirror opposite advantages and disadvantages, which means you need to combine the strengths of both sources, which is what we proceed to.


Sensor fusion: combine information


Having a data record from GPS and inertial sensors, it is required to evaluate:


  1. Absolute translational speed with high temporal resolution.
  2. The angular velocity of rotation around the vertical axis of the car (ie, in the plane of the road at each time point).

We have already calculated the horizontal rotation in the last post , but as it turned out, the old way (through video analysis) is more complicated and capricious than working directly with a gyroscope, so it’s better to change the approach.


Forward speed


It would seem that much simpler. From school physics we know by definition


\ vec {v} (t_1) = \ vec {v} (t_0) + \ int_ {t_0} ^ {t_1} \ vec {a} (t) dt


But we need speed in a fixed coordinate system, and the coordinate system of the accelerometer rotates with the device. Denote by R the rotation matrix from the device’s coordinate system to the fixed one, then


\ vec {v} (t_1) = \ vec {v} (t_0) + \ int_ {t_0} ^ {t_1} R (t) \ vec {a} _ {D} (t) dt


Where \ vec {a} _ {D} - acceleration along the coordinate axes of the device. In turn, the rotation matrix is ​​obtained simply by integrating the rotations measured by the gyroscope ( W - derivative of the rotation matrix, it can be calculated, knowing the angular velocity around three axes):


R (t_1) = R (t_0) + \ int_ {t_0} ^ {t_1} (I + W (t)) d t


And everything? This was not the case, it works only for a spherical accelerometer in a vacuum. Let's look at the real data from the smartphone, lying motionless on the table:


Raw accelerometer readings per axis


Observations:



The second experiment is to write the total acceleration from the phone lying on the table with the screen up, and then with the screen down, and compare:


Overall measured acceleration up and down comparison


Observations:



Model noise and auto calibration


So, to get a real acceleration in a fixed coordinate system, corrections to the "raw" measurements of the accelerometer are necessary. Let's apply a simple model:


v (t_1) = v (t_0) + \ int_ {t_0} ^ {t_1} \ left (\ vec {g} + R (t) \ cdot \ left (\ vec {a} ^ {RAW} (t) + \ vec {h} \ right) \ right) dt


Where



The rotation matrix will be obtained by integrating the angular velocities from the gyroscope, but the remaining parameters are unknown (including \ vec {g} because unknown initial orientation of the smartphone). Find the calibration parameters with GPS data. Recall that GPS measurements are pretty accurate. So the speed calculated by inertial sensors should be close to the speed according to GPS . We formalize this intuition as an optimization problem. For each interval between adjacent GPS measurements (about 1 second), we calculate the distance traveled according to inertial data, and according to GPS data, and assign the target function to the L2 metric over the entire recording time:


\ min _ {\ vec {g}, \ vec {h}, \ vec {v} _0} \ sum_ {i = 1} ^ n \ left (\ left \ | \ sum_ {k = k_i} ^ {k_ {i +1}} (\ tau_k - \ tau_ {k-1}) \ vec {v} _k ^ {IMU} \ right \ | - (t_i - t_ {i-1}) v ^ {GPS} _i \ right) ^ 2


Where



Also denote \ Delta \ tau_j \ equiv \ tau_j - \ tau_ {j-1} and \ Delta t_i \ equiv t_i - t_ {i-1} , and substituting corrections to the raw measurements of the accelerometer, we obtain the final form of the objective function:


\ min _ {\ vec {g}, \ vec {h}, \ vec {v} _0} \ sum_ {i = 1} ^ n \ left (\ left \ | \ sum_ {k = k_i} ^ {k_ {i +1}} \ Delta \ tau_k \ left (\ vec {v} _0 + (\ tau_k - \ tau_0) \ vec {g} + \ sum_ {j = 1} ^ k \ Delta \ tau_j R_j \ cdot (\ vec {a} _j ^ {RAW} + \ vec {h}) \ right) \ right \ | - v ^ {GPS} _i \ Delta t_i \ right) ^ 2


It looks scary, but in fact it is a simple quadratic function, it is easy to take derivatives analytically and optimize using any numerical method, for example L-BFGS .


For a start, check on a short segment in half a minute:


calibration calibrated


Here, “by eye” it is not very clear which graph is more correct, but it is possible to declare success, at least in the sense that it was possible to select calibration parameters of the amendments, with which the velocity estimates from two different sources are very close. Now let's try the same approach on a longer recording, about 10 minutes:


Something went wrong


Here of course a complete failure. It turns out that the model of simple corrections is insufficient for long time intervals, that is, there are important sources of distortions that it does not take into account. It can be



Local auto calibration by sliding window


To deal with the problems of a simple model of amendments for inertial sensors can be different. Possible options:


  1. Improve accuracy by simulating unaccounted sources of error (gyroscope drift, white noise accumulation).
  2. Cheat

We cheat. After all, in the end, we are not interested in the calibration model itself, but in the final speed values. And if the calibration copes on short segments, you can use the standard technique with a sliding window:



sliding window


Final result:


Final result - long-range time series calibrated with overlapping sliding windows


We declare success with a forward speed, we return to the angular speed of turns.


Corner speed


The angular speed of rotation is needed to calculate the radius of rotation (and, accordingly, the angle of rotation of the steering wheel): r = v / \ omega . The gyroscope measures the angular velocity directly, but in three-dimensional space: in addition to the rotation around the vertical axis (yaw - actually turns), the data also reflects rotation around the transverse (pitch - change of the road gradient, passage of speed bumps) and longitudinal (roll - ride one side in a rut or pit):


pitch-roll-yaw


We need to select from the three-dimensional rotations only the component around the vertical axis (in the coordinate system of the car). Accordingly, you need to get the direction of this axis. To do this, we use the observation that the magnitude of rotation around the vertical axis is much larger than around the longitudinal and transverse (turns at 90 degrees is the norm, but fortunately, there is no change of slope and track). So you can simply take the axis around which the total rotation was maximum, for the vertical.


Mathematically, it is convenient to isolate the dominant rotation axis by presenting each elementary rotation (that is, measuring a gyroscope) in the form of a quaternion. Rotation around the unit axis (x,y,z) at an angle \ theta seems to be a quaternion


q ((x, y, z), \ theta) = \ begin {pmatrix} x \ sin (\ theta / 2) \\ y \ sin (\ theta / 2) \\ z \ sin (\ theta / 2) \\ cos (\ theta / 2) \ end {pmatrix}


The convenience is that the first three components of the quaternion characterize simultaneously the direction of the axis of rotation and the amount of rotation. Therefore, a good result is provided by the principal component method applied simply to the first three components of the quaternion. After selecting the dominant axis, it turns out this picture:


Rotations after PCA


It can be seen that a noticeably larger proportion of rotations passes around the selected dominant axis than around the other two perpendicular to it.


By the way, in the previous post , the principal component method was also applied, but then I did not realize to apply it to rotations directly, but selected a horizontal plane along a three-dimensional trajectory (that is, based on displacements instead of rotations). The new method is better not only because it does not completely need information about the displacements, but also because the vertical axis stands out in the coordinate system of the device . That is, with the new approach, the vertical axis is perpendicular to the local plane of the road, instead of the median plane of the entire trajectory in the old approach. For example, now the vertical axis turns (together with the car when moving from the road uphill to the road downhill). As a result, the selection of the horizontal rotation of the total rotation has become more accurate.


Data


Bonus to the code I want to share the already processed data for those who are interested to play with the training of their models. In the torrent there are more than an hour of raw data recorded on the roads near Moscow, plus (in postprocessed directories) the results of calculating the translational speed and the angular speed of turns. If you use - it will be interesting to know about the results!


That's all, in the next series - learn how to predict the control actions from video.


')

Source: https://habr.com/ru/post/329484/


All Articles