In C ++ direction, we developed an application for counting the number of visitors based on the analysis of the video stream from a video camera. Its distinctive feature was that the application was developed as a separate stand-alone module for launching on the Up Board. This allows customers to buy the required number of devices and install them themselves in the required places. A separate server application was also developed that allowed you to configure these devices remotely, get statistics from them, a video stream, and present data about visitors in a convenient way.
At the core of the visitors counting application is the use of the well-known OpenCV library, which has a huge number of computer vision algorithms and allows you to easily and quickly process the input video stream.
In general, the task can be presented in the form of several modules:
- Selection of moving objects from the general video stream
- Object segmentation and search for people among them
- Separation of groups into individuals
- Trekking people
- Counting visitors through a certain gate
In general, there are many examples where this problem is solved in one way or another and you can easily find a code where a simple python script is able to select visitors on a couple of screen pages and enter their counting. However, for the real use of these algorithms we were not enough and had to look for ways to improve them.
')
The main problems we faced were the following:
- At the request of the customer it was necessary to leave only the minimum number of settings. so that the client can install the “up board” at different heights (for example, 3-5 m) and it itself could start counting people without settings. That is, it was necessary to simplify the selection of customizable parameters and to make the algorithm more resistant to their changes.
- The processor power of the selected devices was rather small, and the algorithm running fast on an average computer became terribly slow when running on test modules.
- The flow of people in shopping centers often leads to the fact that people are too close to each other and their separation into separate people is required, which is usually not considered in such tasks. It is possible to use various people detectors to separate them, but they also require performance.
Consider the basic principles of solving the voiced problems in our application.
Background removal
The standard OpenCV approach to the selection of moving objects is to use the finished filter “
Background Subtraction ”, which automatically solves this problem, however, an experiment on video sent by the customer showed that its capabilities were not enough and we needed a more flexible approach. This required a higher-quality selection of objects, a flexible time for “mashing stopped objects,” selection of parameters, and so on. To solve this problem, we have successfully applied the
Vibe algorithm with our own implementation. Its use made it possible to improve the selection of objects and more widely vary the detection parameters.
Object Segmentation
After receiving the mask of moving objects, it is necessary to select separate elements from it. In general, this task is fairly standard. For these purposes, the resulting object mask is first processed using morphological operations to remove noise, and then segmented using contour selection. The size of the contours can be filtered in accordance with the visible size of the person in the image and discard foreign objects.
Separation of people
Among the elements received, we must identify specific people. The problem is that, firstly, a person’s mask is often broken into many small elements (due to the similarity of clothing with the background) and then parts of the mask must be combined into one object. Secondly, it has the opposite case, when several people walking nearby look like a single object.
Solving the first task was part of the solution for tracking people described below, and solving the second problem required a lot of experiments and the development of a fast and efficient algorithm. In our case, it was assumed that the camera would be installed in the corridors, at the exit from the store, etc., that is, the flow of people in the frame is directed mainly vertically. This assumption made it possible to separate people by analyzing the shape of objects. Thus, in the initial contour of an object, its thickness was considered in the direction of movement of people and the resulting graph was analyzed for the presence of selected peaks. If there were several peaks, the size between which is comparable to the size of people, then it is very likely that they correspond to individual people.
Trekking people
After selecting a person, a conditional rectangle is displayed around him, the task was to compare neighboring frames and determine where it is moving. Having a perfect mask, the task turns out to be trivial, you only need to find the intersection of the rectangles and we can determine the direction of motion.
But in the general case the task was complicated by the fact that often a part of a person merges with the background and the mask turns out to be torn. At the same time, separate parts of the body become too small and this must be taken into account when analyzing the size of objects. Having such a mask, even a person is difficult to identify people in the image.
To solve this problem, we used data from previous frames, combining which we could get a more complete mask of a person without problem areas.
Counting people
The last task was to count the number of people who had passed. For this, two conventional lines were formed, forming a “gate”. When crossing these lines, it was considered in order that a person entered or left. When solving this problem, there were also problems associated with errors in obtaining a mask, as well as with chaotic human movement. So the mask could disappear on a pair of frames (for example, the person merged with the background), then appears behind the line. Or a person can go through one line and then immediately return. Often, when leaving the cash register, people pass one after the other and merge into one object, and then split into two, which leads to the magical appearance of a new person.
The totality of such issues was solved in a complex way by tracing a person, if he wanders between the lines, and by creating additional slightly displaced gates, which reduced the likelihood of error. In general, this led to an increase in counting accuracy in difficult cases to 85-90%.
In addition to the above solutions, the project experimented with other approaches to the tasks. Very promising was the use of stereo cameras for precise separation of people. For this purpose, we tested kinect as a camera, which significantly improved the accuracy of the separation of a group of people. However, it was important for the customer to have a monolithic cheap device with a single camera, and therefore had to abandon this approach.

Another interesting point in the project was the need for low-level optimization of algorithms to obtain an acceptable 30 frames per second. For example, such an insignificant part as optimization of the random number generator allowed to increase the speed of object extraction by 2 times, and the overall performance raised by 20-25%. Also, when plotting the mask thickness, the number of points was initially simply summed by hand, which is rather slow. Subsequently, the algorithm was complicated by the good old Bresenham algorithm for manually finding contour points, which significantly improved the speed of this module.
Not in the project and without incidents. So for a long time we optimized the calculation for acceptable speeds on test videos, but when we made a reporting video on the last examples of the customer, there was some inhibition that caused his complaints. After analyzing the situation, it turned out that these videos were made on an action camera with 60 fps. Simple reformatting to the standard 30 fps immediately led to the fact that people in the video began to move almost at a run.
Thus, our team has gained experience in solving problems of detecting people in a video stream with an acceptable speed on portable devices like “Up-board”. And the developed software allowed the customer to introduce and sell people counting systems as separate easy-to-integrate modules.