Video analytics 2.0 or what have left the objects. Part 1
What thoughts do you have when you hear the concept of "Video Analytics 2.0"? What kind of actual tasks could be assigned to hypothetical video analysis technologies of the next generation?
Among the popular answers, there is likely to be “noncooperative identification of a person’s identity among a going crowd with a probability close to 100%”, “identifying intruders among visitors”, “intercameral simultaneous tracking of many objects without disrupting tracking”, “recognition and classification without errors of everything that is visible in frame". ')
The engineer associated with the installation of security systems will wish to maximize the automation of the detector settings due to advanced self-learning algorithms, which will significantly reduce the cost of commissioning and warranty service.
And the smartest someone will say that video analytics 2.0 is possible only with artificial intelligence, which is impossible at the current level of technology development. Therefore, we have no choice but to observe the leaders of the market analysts, who already squeeze the maximum possible from the available computing resources and wait for the massive introduction of quantum computers. Hoping it will happen anyway.
Although the answers to the question will be different, one common concept (often it is not even voiced, but simply implied) will occur in each of them - the next generation video analytics must work effectively in real conditions of use - on the streets, train stations, subways, in everyday life.
Whatever the detector is: abandoned objects, fights, vandalism, recognition of personalities - it must work in real, not laboratory conditions and at the same time fully satisfy (and better surpass) consumer expectations.
This is due to the fact that for about 5-6 years the video surveillance market has been waiting for a qualitative breakthrough in the field of situational video analytics. (Situational video analytics - video analytics of detection of specified, deterministic situations.)
After the boom of video analytics of 2009-2010, which is mentioned in the article “ Bad guys of video analytics market ”, situational video analytics was seriously discredited, consumers were disappointed in its capabilities due to the discrepancy of advertising promises to real results in “field” application conditions. No matter how surprising it may sound, but today, in 2016, the situation of situational video analytics is only slightly better than in 2010.
And, if with detectors designed for “quiet” scenes without movement is still more or less normal, then with detectors that need to work in conditions of intense passenger traffic or a lot of activity in the frame - the situation leaves much to be desired.
Even the leaders of the intellectual video surveillance industry fail to reach a qualitatively different level of efficiency of situational detectors and to ensure their performance, satisfying consumer expectations in ordinary, not laboratory, conditions. The situation in the industry is most clearly and fairly accurately shown by the capabilities of the detectors of abandoned objects offered today by almost all market players.
Why?
Firstly, depending on the approach, almost all basic technologies used in video analysis can be applied in the detector architecture: background segmentation, working with multidimensional Gaussian distribution and other statistical tools, pattern recognition (HOG or convolutional neural networks), tracking and multitracking.
Secondly, the task of detecting an abandoned object is well formalized, which makes it possible to objectively evaluate the efficiency of the detector of one or another manufacturer.
Let's see what solutions are offered by reputable companies that have their own video analytic detectors.
Solutions of Russian companies
Macroscop offers a module to its software worth 1500 rubles. The module, as it should be expected, has a time setting, after which the object is considered abandoned, as well as a number of conditions for the object to overlap with other foreground objects, as well as the video quality (resolution not less than 640x480, frame rate not less than 5 frames /with). The Macroscop director for development, Peter Harebov, tells in detail about them in the following video (about the detector from 12:26):
Separately, it is worth paying attention to such a parameter as the size of the detected object. The 3% claimed by the macroscope sounds, at first glance, not bad, but in fact, 3% of the frame is a lot:
Imagine a suitcase of this size.
Hidden text
Of course, they are different, but still.
In fact, this means that the detection of objects by this detector is possible only in the immediate vicinity of the camera lens. As a result, if you solve the problem of effectively detecting objects on the platform, you will need to install video cameras every 5-7 meters, which will make the cost of the solution transcendental and even the cheapness of the detector will not save.
But, as far as video analytics is concerned, there can be no complaints about the Macroscopy, since video analytics is not their main specialty.
First of all, they are the creators of excellent scalable video surveillance software, one of the key features of which is an accurate attitude to the resource-intensive process of displaying and storing video streams, which allows you to build scalable systems without overpaying for hardware resources.
Therefore, let's see what companies offer, specializing primarily in the development of video mining algorithms.
But, if objectively, these are not real metro conditions: in the places where the detector is tested, it is deserted, no one covers the subject, and the object itself is contrasted, statistically very different from the background. Usually, the situation in the subway looks like this:
And sometimes this happens:
The difference with the video shown by Synesis is catastrophic.
Visualization of statistics provides an even more visual idea of ​​the difference:
On the left - the statistical difference between the subject and the background in the video of Synesis, on the right - the deviation from the background, which is detected by the detector of the left objects "Video-intelligence" in a real situation.
Vocord.
Experienced company, 17 years on the market. According to information from their website, 120 specialists work in the staff, 80% of which are “mathematicians, developers, engineers” Judging by the serious human and intellectual resources, the level of video analytics should also be serious.
As part of the product Vocord Tahion offers a module "abandoned / carried items". On the company's youtube channel there is an example of the module operation:
Unfortunately, again the same depressing, very simple to detect and not having any relation to reality situation: a flat light floor, a contrasting black bag and emptyness. Although, we must pay tribute, Vocord, unlike Synesis, imitated the partial overlapping of the object by man. But how much such an experiment is different from reality and what is the usefulness of such a detector in real conditions can not even be said.
One gets the impression that large players, unable to create a really working algorithm, are forced to make part of the detectors purely for show, just so that their product complies with the requirements of any tender on a formal basis.
Foreign solutions
Someone will say: "And why do you only consider domestic manufacturers, foreign ones, most likely, offer products of a higher class".
But, surprisingly, the quality of detectors of abandoned items of foreign manufacturers is exactly at the same level. They only ask for a completely different money for them.
For example, the detector development of the Israeli company Agent Vi, which is considered one of the best on the market, in the advertising video demonstrates the following features:
Again a contrast object and no one around. The cost of this detector is more than 45,000 rubles for 1 video channel.
Or here is the Italian Technoaware detector, worth about 30,000 rubles per channel:
The capabilities of the detector are again almost identical to all of the above.
Why and what to do?
At this point, an inquiring mind will assume a logical one: even if industry leaders, with a staff of hundreds of people, have not yet offered the market any more or less real-world detectors of abandoned items, then this may be due to some objective reasons? Somehow:
Nobody needs a detector of left objects, so no one invests in its development.
A high-quality detector of abandoned objects is impossible at the current level of development of science and technology. Leaders of the video surveillance market and so squeeze everything possible from technology.
As for the first assumption, then no, the function of detecting the left object is relevant and relevant.
First, in the technical assignments for intelligent video surveillance systems for the metro or train stations, detectors of abandoned objects are often spelled out explicitly. The money there is usually considerable, and for them there is a serious struggle among the major integrators. But developers, nevertheless, for some reason cannot offer an option working in real conditions even for money.
Secondly, a normally working detector is also in demand in the consumer video surveillance market.
As a confirmation, we can mention the large-scale competition for the development of video analysis algorithms, which at the end of 2012 was conducted by the well-known Habravchan company Ivideon, the creator of the world's most successful cloud-based video surveillance service. There were only three tasks in the competition, and one of them was the detector of abandoned items.
The relevance of the detector for Ivideon, as a provider of cloud video surveillance service, is clear: a well-functioning detector of abandoned items will allow customers to offer interesting functionality through which you can monitor a car or parking space in the yard, stroller or bicycle in the stairwell, etc.
But this is possible only under the condition that the detector will provide close to 100% probability of detecting situations and, at the same time, will not “spam” the user with false alarms. Otherwise, it will not be possible to monetize the function, and even if it is offered for free as a PR feature, it is likely that it will only harm the reputation, discrediting the quality of service in the eyes of consumers.
If we take into account that after 3 full years from the moment of the competition, Ivideon did not offer any new video analytics functions, we can conclude that success in creating a high-quality detector of the left objects was not achieved.
Therefore, it is possible that Assumption No. 2 is correct, and the detection of abandoned objects in the conditions of the subway or train stations is an essentially unsolvable task now. By the way, some reputable and well-deserved companies in the field of video analytics and video surveillance speak about this directly.
Opinion industry leaders
The company Spetslab - a pioneer of the video analytics industry and the creator, in their own words, of the term “video analytics” itself, has the following opinion, often expressed in a rather categorical form. Quote:
“We thought up the detector of the left objects ten years ago, and it never worked anywhere for all ten years ago, not in one company, not at one object. Let's leave the tale, the tale of the detector of the left objects. ”
(In the video this moment from 15:47)
Or, ITV, one of the ten largest players in the industry and the leader of the Russian market for video surveillance systems, also says that there are objects where video analytics cannot work effectively and that it is generally intended only for working with sterile zones (from 11:22) :
But is it really? Experience shows no.
The opinion of large market players is more connected with several other, different for different companies, reasons: from simple competition, to aversion to reality - since we could not do this in 20 years of work, then this is basically impossible.
The above leads us to the idea that the future of video analytics is precisely for new creative teams. Those who are not looking at the stereotypes that have emerged will seek new approaches to solving video analysis problems.
In confirmation of this, in the next article we will describe what result can be achieved in the task of detecting abandoned objects in real conditions of intensive metro passenger traffic and what is common between nuclear physics, quark-gluon plasma and video analytics.