Algorithm Particle Filter in computer vision: stereovision

The Particle Filter algorithm is remarkable for its simplicity and intuitive clarity. I propose my own version of its use in the task of stereoscopic vision for comparing the "same point" in two images - from the left and right camera. For implementation (for entertainment purposes only), Python with the numpy libraries (matrix calculations) and pygame (graphics and mouse event handling) is used. The Particle Filter algorithm itself is unchanged from Programming a Robotic Car at Udacity. I am sorry for the fact that I honestly listened to the whole course and did all the homework, including the implementation of this algorithm.

In the task of stereoscopic vision, you need to map small areas (for example, 8x8 pixels) on the left and right frame. With an ideal location of the cameras horizontally, knowing the difference in coordinates along the X axis of the same area between the left and right frame, you can calculate the distance to the object, which is shown in this area. I understand that it sounds confusing, but in fact it is easily derived by the simplest geometric constructions according to the rule of similar triangles. For example, in the video with an unfinished bell tower, we see a fence with similar diamonds stretching into the distance. The rhombus nearest to us is most strongly shifted on the right frame relative to the left, the next one is slightly smaller, and so on.

The standard scheme for solving such a problem is quite computationally heavy. It is necessary to calibrate the errors of the relative position of the cameras so as to ensure that the horizontal line with the Y coordinate on the left frame exactly matches the horizontal line with the same coordinate on the right frame. Then compare each point (or area) along the horizontal line on the left frame with the best point on the right frame (this is solved, for example, by the dynamic programming method, which has quadratic complexity). Then we will have calculated the displacements along Ox for each point along the horizontal considered. And repeat the procedure for each horizontal line. A bit complicated, and it’s not at all like the way it works in the brain (we know that, right?)
')

See how Particle Filter solves the same problem. In my opinion, this is very similar to the biological model, at least micro-eye movements are simulated to focus attention on individual fragments of the image, and the “background” of such micro movements is taken into account.

The Particle Filter algorithm itself is very simple. We generate N points, we name from particles (particles). Particles are placed on the plane randomly around the original hypothesis.

Details for the stereo vision task

In the stereo vision problem, I used the Gaussian distribution, specifically 1000 particles. The frame on the left is the image from the left camera. I model focusing by clicking on an arbitrary point in the left frame with the mouse. In the video you will see a green rectangle on the left frame. Call it the "focus of attention." At the same time, this will be our initial hypothesis, i.e. we believe that the focus of attention on the left frame corresponds to areas with approximately the same coordinates on the right frame.

The coordinates of this point will be the center of the particle distribution. A cloud of particles is generated in the right frame. In the video, two right frame. On the bottom (black and white) shows the entire cloud of particles. On the top (color frame) only the most "correct" 10 particles are shown. See below for correct ~~bee~~ particles.

Then for each particle we check how “correct” it is. The more correct it is, the more weight it has (everyone understood that we make up the weights matrix, that is, a vector, well, in short, you understood).

Details for the stereo vision task

The correctness (ie, weight) of each particle in our problem is a measure of its similarity to the focus of attention in the left frame. I remind you that the particles here are in the right frame. The particles we have are matrices of 8x8 black and white pixel brightness values of the area under study. I stupidly subtract one 8x8 matrix from another (i.e., left from right or vice versa) and consider the sum of squared differences. The smaller this number, the more identical the left and right image.

ATTENTION! This is a very primitive way of comparing two image areas. There are much better ways, for example, before subtraction, to do a normalization of local contrast. I would start the improvements by taking into account the chromaticity, now I ignore it and consider everything in shades of gray.

Now we move the hypothesis. In the Particle Filter algorithm, we should at least approximately know where we are moving the hypothesis.

Details for the stereo vision task

In the simulator, I simply manually click the mouse in the left frame where I want to transfer the focus of attention. For example, I move along the fence (as in the video). Note that at each step of the algorithm, one focus movement is considered.

Knowing the coordinates of the previous and the new focus, we get the displacement vector (for example, 19 pixels left and 2 pixels down).

Now the salt of the algorithm. We generate a new cloud of particles from old particles. At the same time, we do two actions: choose the “old” particles in proportion to their weight and move them to the same vector, to which the hypothesis has moved.

That is, the greater the weight of the "old" particle, the more likely it will move into a new cloud. There is one more nuance when moving new particles, we will surely add chaotic proportions to the displacement vector.

Details for the stereo vision task

So, we choose 1000 new particles from a variety of old particles, with a probability proportional to the weight of old particles. Obviously, in this case, in the new cloud of particles there will be areas of the right frame, most similar to the area in focus in the left frame.

For selected particles, we change the coordinates on the plane, adding the displacement vector of the focus plus a random value (according to Gauss with the center at 0).

We depict these particles in the lower right frame.

Everything. Cycle.

Now we select from the entire cloud 10 particles with the greatest weight (we show them on the upper right frame). The average value of their center is considered the coordinates of the focus of our attention in the right frame. That is, we got the corresponding area on the right and left frame! We can calculate the difference of coordinates, substitute into the formula (in which the distance between the cameras and the focal length of the lens are also involved) and get the Z coordinate - the distance from the cameras to the point of focus of our attention.

Watch a video of how a particle cloud decreases after several focus shifts. At first, the algorithm does not know “where we are”, but after 2-3 iterations the cloud shrinks and, as it were, tracks the movement of the focus of attention, taking into account the prehistory. This is a great advantage of our algorithm. For example, for the standard stereo-vision sequence of repeating elements horizontally is an almost insurmountable obstacle. In our video, this is a fence with repeated diamonds. Due to the fact that the algorithm remembers the background, he confidently distinguishes diamonds from one another.

The result of our work is a black box with white dots in the lower left corner of the window. There are points in the projection above. Look: from 0 to 9 seconds, we mentally move along the fence, and white dots appear parallel to the lower edge of the screen, that is, correspond to the view of the fence from above.

Further, from 13 to 32 seconds, we look at the fence moving away from us. The points on the top projection correctly reflect this situation. Then we still run along the path along the fence, and the white dots draw us a plan of the path (there are traces of some kind of stroller on the sand, but they are almost not visible in the low-resolution video).

Finally, when we transfer our gaze to the bell tower, we see that the cloud of particles suddenly “lost”, that is, the movement of the focus turned out to be too large, and it was not possible to identify an explicit correspondence between the images. But literally in 3-4 iterations the cloud shrinks again, and we confidently track the contours and windows of the bell tower. In this case, the resolution of the camera is not enough to display the difference in coordinates between the left and right frame, so the distance Z to the bell tower becomes equal to infinity. Note that the frame resolution is really bad, only 230x344.

In this video, the algorithm was tested with a different image. See how confidently we cope with the gravestones in 0-20 seconds. Next, we successfully draw the outlines of the shadow from the bush and the bush itself - note that on the projection it correctly falls into the middle part of the shadow it casts. And then again we transfer our eyes to the background, get lost in a cloud of particles, but quickly recover and confidently find the same wall fragments. But they are already so far away from us ...

UPDATE

Source code pastebin.com/a8hDiyMc

Requirements:

Install Python from python.org
Install numpy from scipy.org
Install pygame from pygame.org
Run the program with two parameters: the path to the left frame JPEG file and the path to the right frame JPEG file

Do not judge the quality of the code, it was written for your own entertainment.

Source: https://habr.com/ru/post/152553/

All Articles

Algorithm Particle Filter in computer vision: stereovision

More articles: