Algorithm for determining motion through comparing two frames

Hello.
I want to share with you my work on image processing. Recently, I am engaged in writing a home server under the "smart home" and started with video surveillance.
The task was not so trivial. With regard to all video surveillance, I will write separately (if someone is interested), and now I would like to touch on the topic “Algorithm of motion detection through the comparison of two frames”.
This algorithm is required to enable (disable) video recording from video cameras.
Information on this topic in the network is not so much. The original algorithm was invented by myself (it is very simple), and the article “The algorithm for detecting shadows on a video image” helped me to improve it .
In this article I will only touch on the algorithm (without examples in the programming language). The whole algorithm is based on cycles, everything is elementary and easy to recreate in your favorite programming language.
Examples in the descriptions of the algorithm will be given as "live" and invented tables (for understanding).

Basic algorithm

1. Take the last two frames from the camcorder.

Frame 1 and Frame 2. Pictures are taken as an example. In the second frame, the sun came out (increased brightness), glare appeared on the walls and floor. A girl sat down at the table and a shadow fell from her.
2. We crush the image into blocks and get their average value by color.
Why divide into blocks. For example, we divide 640x480 into blocks of 10x10 in size and take 25 pixels from each block of color). Instead of ~ 300,000 pixels and iterations for analyzing only ~ 3000 iterations and ~ 75,000 pixels for analysis. To determine the movement can be allowed such a simplification.
3. Compare the resulting two color tables (matrices) and obtain the color difference for each block in the third table.
Let's call it MoveMask
4. We filter the third table from noise.
It is done through the selection of "delta". Received flags in those blocks that are on the site of changes in the image.

(1) - the use of "delta" (in this example, subtract 2)
(2) - transition to the finished mask

An example of masking the mask (changes in the image) on the real frame.
5. We consider the "power" of changes in the image using MoveMask.
I do this by summing the adjacent blocks horizontally (removing detached, surviving after cleaning from noise) and using the same algorithm, I summarize the sums of rows vertically. According to the sum of the modified blocks (nearby) and the selected threshold of the “has motion” flag, the trigger is set.
For example, consider this MoveMask

In this example, the “strength” of the changes is equal to 12. The “strength” of the same change will be different, depending on the size of the blocks. Therefore, the size of the blocks and the trigger threshold (at what “strength” the motion trigger is triggered) are adjusted relative to each other.

The advantages of this algorithm are: ease of programming, low resource consumption, good sensitivity (does not let in the slightest movement).
I see only one drawback - triggering on the change in illumination. Since the entire algorithm is based on the analysis of color, then in cloudy weather you can easily watch when the sun comes out from behind the clouds and the room becomes much lighter. At this point, they are attacking the blocks throughout the image. In paragraph 4 of the algorithm, I showed a mask with an overstated "delta". In reality, on this example (see item 1) with a working delta, the mask covers almost the entire frame.

If you overestimate the delta (as in paragraph 4 of our algorithm), then this greatly affects the sensitivity, and our motion sensor may stop working on dimly lit objects.
')
Therefore, I began to look for a solution to the issue, and to my delight I found a hint at the Habré (see the link at the beginning of the text). I wanted the algorithm to work only on objects (not transparent), and the transparent shadow from the girl and the light on the walls and the floor did not make our motion sensor trigger.

Improved algorithm

1. Take the last two frames from the camcorder.

Frame 1 and Frame 2
2. We crush the image into blocks and get their average value by color.
3. Compare the resulting two color tables (matrices) and obtain the color difference for each block in the third table.
4. We filter the third table from noise.
Let's call the summary table MoveMask.
5. At this step, we obtain a table of averaged values near each block (see image).
We get something like averaged background around a point.

The block with the number and the neighboring blocks are summed up (marked with dots). You can take more neighboring blocks. The average value is placed in a block with a number. The tables are called AvFrame (1 and 2).
6. Make a table of the difference between the values obtained in step 5 and filter it according to its “delta”.
Similarly, as with the frames in paragraph 3 of the first algorithm. See illustration there.

This mask is shown on a real frame.
7. Now we make the multiplication, which is called “relative correlation” (see the article mentioned at the beginning).
Create a table, where in the cells we write the result of the following calculation | frame1 [x] [y] x AvFrame2 [x] [y] - frame2 [x] [y] x AvFrame1 [x] [y] |. Again we filter with our “delta”.
Those. we multiply the color of the block from the first frame by the average value in the same block, but the second frame, similarly the color of the block of the second frame, by the average of the first one and find the difference.

This mask is on a real frame.
8. Now we cross the tables from p. 6 and p. 7.
I put the arithmetic mean only in those cells that are positive in both tables in the resulting table (let's call it MaskFilter).

We get this picture
9. Filter MoveMask with MaskFilter filter.
We leave in MoveMask only those blocks that are present in MaskFilter at the same positions (or nearby).

Filtered MoveMask in the image. Compare with paragraph 4 of the first algorithm.
10. Additionally (not necessarily), you can still filter MoveMask, removing blocks that have few neighbors (for example, less than 4).

The final look of the mask.
11. And at the end, we consider the “power” of changes with the help of the finished MoveMask table.
See p.5 of the first algorithm.

In this algorithm, by filtering the MoveMask with the MaskFilter filter, we remove most of the blocks from the MoveMask that work on the shadow or highlight. It is possible to reduce more than twice the erroneous response of the sensor.
Among the shortcomings, a huge number of “deltas” and the complexity of both programming and algorithm settings.

Algorithm base and algorithm improved.
In this example, the shadow, the glare on the walls and windows could be removed completely. Glare on the floor were "overcooked" and did not disappear. But, perhaps, for this room, by correcting one of the “deltas”, you can configure the filter to ignore and glare from the floor. Different rooms - different algorithm tweaks.

How can you even improve the algorithm

Maybe if you convert RGB to HSV and try to work without the brightness channel, you can increase the accuracy of the algorithm. Hands have not yet come to check.

I hope my experience and this description of the algorithm will be useful to someone. And if you have something to supplement, correct, suggest, I will be glad to hear.

Source: https://habr.com/ru/post/134635/

All Articles

Algorithm for determining motion through comparing two frames

Basic algorithm

Improved algorithm

How can you even improve the algorithm

More articles: