Why is the demonstration of video analytics in offices so different from the real work in life?

In this publication we will talk about the overwhelming market of video analytics, which is presented today by the so-called intellectual video surveillance.

Already by the very scale one can stick the notion “classical” to this direction. Especially since Intel was at the source, and this is already a classic. It is on the basis of its open source library Open CV that video surveillance developers still make their products. For the sake of pride, I must say, the programmers of this direction are Russians and, moreover, were located in Russia - in the Nizhny Novgorod branch of Intel. Why are located? The direction has been closed for several years, the people broke up in other firms. Apparently, Intel first felt the futility of its “classics”.

Nevertheless, his business lives and actively develops. Only the laziest video surveillance system developers did not use Open CV in their "smart" codes. And this library, after its death, works wonders! As many sellers of video surveillance systems claim, calculates criminal moments, detects fights, determines left and gone objects, finds extremists ... And People hawala. Billions of rubles are vbukhivayutsya in such tasks for the projects "Safe City", "Safety on the Metro", "Operation anti-terror", etc. But this is more of a policy, we will talk about technology, why this beautiful exhibition wrapper cannot work in practice.

Experts call this direction “hard”, because the algorithms of such video analytics are based on the exact setting of parameters and order of actions: cross a certain virtual line, exceed the detected area, put the object ... There is another direction (non-Intel) - flexible video analytics, whose work is not tied to formalized tasks, but we'll talk about it next time.
')
The principle of the classic “hard” video analytics is mostly based on the object detector, localizing the closed video detection areas according to the common signs of their coexistence. But so far there are no such principles to clearly distinguish people from dogs, cats from cars, and a tree branch from a lawn mower. Unfortunately, all this works well only in ideal laboratory conditions, where they try to circumvent such slippery moments as:

1. Video detector based on contrast. The areas merging with the background do not fall under its analysis. So, it is impossible to somehow predict the basic parameters of the object of interest.

The first camera sees a person on a dark background, respectively, detects only a white shirt, the rest of the body parts merge with the background and are not available for analysis. Considering also the problems of lighting, it is practically impossible to distinguish darker in dark or less dark in dark because This is at the level of interference.

The second sees a person on a white background, respectively, only detects a dark head and dark trousers. The white shirt is completely ignored. There is no information for the detector. Thus, the first camera will generally see several objects instead of one person.

2. It is practically impossible to filter out phenomena such as shadows successfully - it takes a lot of forms, constantly running after all of us.

As a result, the proportions of the goal are violated, and the computer does not understand that this is a person.

3. Intersecting goals bring the mind "piece of iron" in complete chaos. To determine that these are two people, and not one or not five, today's algorithms just can not.

4. Group targets are indistinguishable in the form of detection from third-party objects, for example, several people and a car.

5. The parameter “object size”, which demonstrators of video analytics hope for when proving the ability to distinguish people from cars, is unacceptable in 2D video surveillance in principle.

What is more: bird or car?

6. We often hear such an achievement: but we register with several cameras at once! This, perhaps, should sound like a flaw, for the cameras see the object differently.

- The first one sees a dark smooth nape, the second one - a bright face with a long ledge - a nose.
- The first sees a large object, because the person is closer to her, the second is small, because man on. There is no perspective for a two-dimensional view.
- The first one sees the inscription on the front side of the Sport shirt, the second - on the back “Rest”.
- The first one sees a swinging branch over a person’s head, which merges in perspective. The second is a fly sitting in front of the camera, creating the look of an elephant (after all, it is closer).
In general, the list of why tough video analytics is impossible in practice is long, but it has a very interesting aspect: These problems can be easily hidden from a pre-arranged show.

A given homogeneous background, a given contrasting suit, predetermined actions with non-intersecting targets, no interference in the form of bushes, trees, precipitation, glare ... Everything is easy to organize in my office, and then the video analytics turns into a miracle!

PS: Only we talked about the “classics” that the creator had long since buried, and whose name is being exploited in many financial projects. But there are on the market and live video analytics algorithms, their merits and demerits will be discussed in the next article.

It is possible that the corpse will ever be resurrected. Well, at some stage of the new types of computers or X-ray surveillance systems. I would like it only because Intel, most likely, was not the first, Russian guys from other Russian firms came to its Nizhny Novgorod laboratory “Computer Vision”, which were at the forefront of video analytics. In fact, this is a Russian invention. And it is a pity that you have to write such articles. But for the sake of this not to deceive the other Russian people, who are still buying stale, advertising processed meat?

Source: https://habr.com/ru/post/257565/

All Articles

Why is the demonstration of video analytics in offices so different from the real work in life?

More articles: