Deep calculation. How do 3D technologies help people to count and make life easier?

let's get acquainted

Our team develops intelligent software for IP video surveillance systems. For 9 years of existence, we have created dozens of functions and video analysis modules, faced hundreds of problems and won no fewer victories. In our blog Macroscop we will tell about a part of them, share our vision of the development process and reveal some of our technologies.

"Closer to the case"

A few years ago, we identified the low popularity of one of our functions — the interactive search function — by the fact that users did not contact the company with questions or a problem with working with it. When the tech phone is silent - this is a bad sign for developers.

With the visitor counting module, it was the opposite Users did not just buy and install it, they really used it! And so they regularly called with technical support questions, wrote their wishes, told about non-standard ways of use and, of course, shared difficulties in their work. The accuracy of counting was high (over 92%), but it was possible to achieve it with correct installation of the camera, ensuring good shooting conditions (no light, glare, etc.) and painstaking adjustment.
')

frame from the cartoon "Goatling, who counted to ten"

According to our assessment, when working with intelligent modules, users put the accuracy and ease of control of the function at the forefront. When two years ago we adopted the concept of developing a Macroscop product through its simplification, one of the local milestones of this simplification was the reworking of popular visitor counts.

But first things first…

Traditionally, people are considered by video using tracking technology or optical flow method.

Tracking builds the trajectories of moving objects, and the counting fixes the direction of intersection of the virtual entry / exit line. Trajectories can be built in several ways:

1. By analyzing the sequence of frames on which there are moving objects. In general, several moving objects may be present in one frame, so the program needs not only to build trajectories, but also to distinguish between objects and their movements. When moving objects cross the entry / exit line one by one, there is no difficulty with counting: the task is to determine the direction of intersection of the line.

This task can be handled by a counting method based on the simplest implementation of tracking, analyzing foreground objects (moving objects) in two successive frames. First, areas of motion that differ from the background image are highlighted on the current and previous frame, then, analyzing the speed, direction of motion of objects, as well as their sizes, the probabilities of objects moving from one point of the trajectory of the previous frame to another point of the current one are calculated. The most probable movements of each object are added to the trajectory.

2. In the general case, people in a frame can move in different ways: their trajectories can intersect or overlap, and the areas of motion corresponding to objects can be combined into one area. In this case, the program needs to identify each object, divide groups of objects and correctly calculate people crossing a virtual line in one direction or another.

In these cases, the task of constructing the exact trajectory of individual objects becomes more complicated, then the method of constructing the trajectory over two frames does not fit, it gives a high error. The analysis of the sequence of frames and continuous post-processing of the obtained results is used: the program builds graphs - analyzes the transitions of objects from one state (position) to another; analyzes the speed and direction of movement, position, color characteristics. As a result, a set of the most probable object displacements, forming a trajectory, is given.

To improve the accuracy of counting, methods of separating people in groups are also used. This can be done by estimating the area of groups or by detecting and counting the heads of people.

Tracking calculation provides the best accuracy when people in the frame overlap to the minimum. And in real systems this can often be achieved only by installing a camera over a limited passage (narrow entrance door, escalator, etc.).

Counting visitors based on optical flow analysis
If the counting on the basis of tracking finds the object in the video stream and monitors its movements, then this method observes the virtual line of entry / exit and analyzes the movement of color pixels through it. The method monitors the movement of an area of a certain brightness and a certain color through the line, calculates the characteristics of the image features (edges, corners, special points, information about the texture, etc.). At the same time, the method only records the fact that an object is moving through the line, but not determines what kind of object it is, how many people move in the given object. Methods for detecting heads and analyzing the area of a moving object are also used to determine the number of people crossing the line.

This method is applicable to a dense flow of people when traditional tracking methods are unsuitable. The most accurate result is achieved when the flux of people is approximately uniform.

We implemented both the first and second methods in our calculations. Depending on the conditions under which the counting takes place, the user can select the most appropriate mode of operation. If the shooting conditions are close to ideal (in terms of counting), the setup will not require much time and effort, but if this is not the case, the administrator will have to become puzzled.

As a result, the counting of visitors for complex scenes of observation worked in proportion: the more power the user spends on setting up the module, the more accurately he will work. That in our general concept did not fit.

We were puzzled by the search for new solutions for counting visitors.

New module 3D-counting visitors

The new module is implemented in a fundamentally different way. If earlier calculation used data of two measurements, then the third one is entered in the new one - depth (distance from the video camera to the person). Now counting is not just a module, it is a software and hardware complex from a special device — a depth sensor — and a software module for data processing. The sensor calculates the distance from the device to the objects, radiating and receiving IR signals, builds a depth matrix with which the program is already working.

Depth provides information about the height of the person who crosses the entry / exit line and allows people to be distinguished from other objects. The user needs to set the minimum height of the visitor in the settings, and all people of this height and higher will be counted by the system.

For a user, 3D counting of visitors is extremely simple: you only need to set two settings — height and entry line.

Its results practically do not depend on the conditions in which the calculation is made (unless you think in some very difficult reliefs).

It is ultra-accurate - 98.5% in real conditions with real users (and not in the "greenhouse" laboratory, as programmers often like to test). The highest accuracy is due to the fact that the module does not work with a picture, but with a three-dimensional map. In addition, we implemented several technologies to solve a number of key tasks in the calculation:

The task of separating people. When people are close to each other, their outlines at a given height can merge into one. To avoid the “loss” of a person, we “cut” the depth map into layers and get multi-layer contours of objects. The contour without attachments corresponds to the top of the person. We consider the top.
The task of determining the trajectories of the movement of people. Tracking is used for this, but is absolutely new, which takes into account the peculiarities of the obtained depth data.
The task of processing the depth map. We obtain the depth data by evaluating the infrared signals reflected from the surfaces of the radiating device. But the rays are differently reflected from different surfaces, so in some cases the map is obtained with “holes”. We have created an algorithm that completes the map based on values in known areas.
The task of angle compensation. In order to maximally relieve the installers of the video systems, we implemented an algorithm that takes into account the deviation of the visitors counting device from the horizontal and corrects the depth values accordingly.
The task of automatically determining the distance to the floor. Its solution eliminates the need to accurately measure the height and set it in the module settings. It also aims to improve the usability of counting.

"Keep it simple and people will reach out to you ..."

New 3D-counting of visitors is much simpler than the traditional module. It is also easier from the point of view of the technologies implemented in it, and from the point of view of user work. However, it is much more accurate and less "whimsical" to the conditions of shooting.

When we came up with new 3D-counting to users on real objects, the highest impression was not even the highest accuracy (98.5% in real conditions), but this very simplicity and the almost complete absence of any settings. This once again assures us of its desire to develop, balancing simplicity and functionality, refuting the stereotype that a cool product should be heaped and complicated.

Source: https://habr.com/ru/post/341470/

All Articles