The orientation of the mobile robot, the choice of method of registration of special points of images

→ This is the background and continuation of the article:

It was in the evening ... ~~all the articles on Habré were read~~ , I started a “small” project on autonomous orientation of the robot on RaspberryPi 3. There are no problems with iron, it is inexpensively assembled from ~~Mr. and sticks of~~ parts purchased on ebau, a camera with good glass optics (this is important for stability of calibrations), drive the camera up, down and compass, gyroscope, etc. attached to the camera:

')
Existing SLAM systems do not suit, either for the price or for quality / speed. Since I have a lot of elaboration of details for Visual SLAM, I decided, step by step, to write and share algorithms and code in open access, with justification for the reasons for choosing one or another algorithm.

In general, the following plan (all in C ++, except the last):

Capturing video from the camera (mostly written)
Selection of special points on each 1 / 1-1 / 8 frame of the image (written)
Registration of special points (written)
Trace special points (spelled, easy)
3D Reconstruction of a cloud of points and camera coordinates in it (it is written, not very difficult)
Drivers for motors, compass, etc. (mostly written)
Writing at least primitive top-level logic, for debugging the previous code (where and why to move, on python, while delayed)

Selection of singular points and their registration is a critical step both in performance and accuracy. Existing (known to me) algorithms are not suitable (for this platform), or in speed, or accuracy (precision).

The matching algorithms (matching) points have 3 characteristics:

recall (what percent of points is connected correctly from all detected)
precision (what percent of points is connected correctly from all connected)
speed (time of detection and binding)

Since the connected points are used further to find the transformation matrix using the RANSAC method, the complexity of which is ~ O ((1 precision) ^ N) where N is the minimum number of pairs of related points necessary to calculate this matrix. If you are looking for a plane (homography matrix) in 3D (projected in 2D), then N = 4, if there is a hard transformation of 3D points (the fundamental matrix), then N = 5-8. That is, for example, if precision = 0.1 (10%), then to search for homography you will need tens of thousands of expensive samples, and for fundamental, millions. Therefore, an algorithm with high precision was developed and tested.

For testing detection and binding algorithms, there is a standard set of images proposed by Mikolajczyk et al. (Included in the opencv opencv_extra distribution)
In this set of 8 sets of images, each set consists of 6 pictures.

The first picture is the reference. The remaining five are distorted, with an increasing amount of distortion. Each set - for its own type of distortion (rotation, shift, scale, shutter speed, parallax, smearing, de-focusing, compression artifacts) Here are the results for different algorithms (SIFT, SURF, AKAZE, BRISK, ORB and developed by BOEV (Bigraph Oriented Edge Voting )), vertical scale - percentages (more is better) and horizontal number of a picture in the set (more is more complicated). To the right of the algorithm name is the set execution time (detection + decription + matching).

Set bark (scale + rotation) precision: