In search of a UFO. Detection of objects in the image

Breaking captcha is, of course, interesting and informative, but, by and large, it is useless. This is only a special case of a problem that arises in one of the interesting directions of development of IT - pattern recognition ( pattern recognition ).

Today we will look at the algorithm (more precisely, more correctly, consider this a technique, because it combines many algorithms), which stands at the junction of such areas as Machine Learning and Computer Vision.
')
With the help of this algorithm, we will look for UFOs (covetous on the holy) in the images.

Introduction

The presented technique was first described in the article “ Rapid Object Detection using a Boosted Cascade of Simple Features ”, Paul Viola, Michael Jones, 2001. Since then, it has been widely recognized in its field. And the scope, it is not difficult to guess, this is a search for objects in images or in a video stream.

It turned out that initially the technique was developed and applied in the field of developing algorithms for detecting faces, but nothing prevents to train the algorithm to search for other objects: the machine; prohibited objects on the x-ray at the airport; swelling in medical images. In general, as you understand, this is serious and can be of great benefit to humanity.

Method Description

AdaBoost

The methodology is based on the adaptive boosting algorithm (adaptive amplification) or abbreviated AdaBoost . The meaning of the algorithm is that if we have a set of reference objects, i.e. there are values and a class to which they belong (for example, -1 - no face, +1 - there is a face), besides there are a lot of simple classifiers, then we can make one more perfect and powerful classifier. At the same time, in the process of drawing up or training the final classifier, emphasis is placed on standards that are recognized “worse”, this is the adaptability of the algorithm, and in the process of training it adapts to the most “complex” objects. View the work of the algorithm here .

In general, AdaBoost is a very efficient and fast algorithm. In my projects, I use it to detect weak anomalies against the background of strong interference in various data that are in no way connected with images and have a different nature. Those. The algorithm is universal and I advise you to pay attention to it. It is distributed in data mining, which is now popular in the world, even entered the " Top 10 algorithms in data mining ". Very informative publication, I advise everyone.

Haar-like features

The question is how to describe the picture? What to use as a sign for classification? Taking into account the fact that it is necessary to do this quickly and our objects can be of different shapes, colors, slopes ... In this method, the so-called haar-like features are used (I will call them primitives in the following).

In the picture above, you see a set of such primitives. To understand the essence, imagine that we take a reference image and impose on it any of the primitives, for example, 1a, then we consider the sum of the pixel values in the white area of the primitive (left side) and black area (right side) and subtract the second . As a result, we obtain a generalized anisotropy characteristic of a certain part of the image.

But there is a problem. Even for a small image, the number of superimposed primitives is very large, if you take an image with a size of 24x24, then the number of primitives is ~ 180,000. The task of the AdaBoost algorithm is to select those primitives that most effectively select this object.

For example:

For the object on the left, the algorithm chose two primitives. For obvious reasons, the eye area is darker compared to the middle area of the face and nose. The primitives of this configuration and size best “characterize” this image.
On the basis of such classifiers, a cascade is constructed with the most efficient primitives selected. Each subsequent element of the cascade has more stringent conditions for successful passage than the previous one (more primitives are used), thus reaching the end only the most “correct” ones.

Implementation of algorithms

We are lazy guys, so we will use the implementation of this technique from the OpenCV library . There are already written modules for creating samples, learning the cascade and testing it. I must say that the implementation is quite raw and therefore should be prepared for frequent departures, freezes in the learning process and other unpleasant things. I had to dive into the sources several times and edit them for myself. A very detailed and easy to understand tutorial describing the work with the implementation of this technique can be found here .

The learning algorithm is a very long-running thing. With the right approach, the learning process can last 3-7 days. Therefore, we will maximally simplify the task, since I have neither time nor computational resources to spend a week on training. On the training cascade for this article, I needed 1 day work Core 2 Duo.

It should be noted that the implementation of OpenCV used a more advanced modification of the algorithm AdaBoost - Gentle AdaBoost.

Formulation of the problem

With theory everything. We proceed to practice. Our task is to find this (the artist of me is bad):

On such images (it is necessary to take into account that in the work all color images are translated into grayscale, otherwise the number of invariants is too large):

Provided that:

1. The object may have a different color. ± 50 values from the original.
2. The object may have a different size. Size can be changed up to 3 times.
3. The object has a different angle of inclination. The angle ranges from 30 °.
4. The object has a random location in the image.

Stage 1. Creating a training set.

The first and very important step is to create a training sample. Here you can go two ways. Submit a pre-compiled database of images (for example, individuals) for training, or generate a specified number of cases on the basis of a single reference object. The last option is suitable for us, all the more so that to generate a sample based on a single object in OpenCV, there is an createsamples module. As background images (i.e., images on which the required object is missing), a couple of hundreds of space images are used (example above).

Depending on the specified parameters, the module takes the reference object, applies various deformations to it (rotation, color, noise is added), then selects a background image and places an object on it randomly. It turns out the following:

In real-world tasks, you need to focus on the size of the training sample in the region of 5000. I generated 2,000 such objects.

Stage 2. Training

Now you need to create a cascade of classifiers for the existing database of objects. For this we use the haartraining module. Many parameters are transferred to it, the most important of which are the number of classifiers in a cascade, the minimum required classifier efficiency ratio (minimum hit rate), the maximum permissible false response rate (maximum false alarm). The parameters are much more and the one who decides to repeat the experiment will be able to get to know them in more detail here .

Stage 3. Testing the cascade

After a long wait, the program produces a trained cascade in the form of an xml file that can be used directly to detect objects. To test it, we again generate 1000 objects according to the principle described in the first stage, thereby creating a test sample.

To test the cascade, the performance module is used. By feeding him a test sample and a cascade, after a few seconds we can see the following picture in the console:

  + ================================ + ====== + ====== + == ==== +
 |  File Name |  Hits | Missed |  False |
 + ================================ + ====== + ====== + == ==== +
 |  0001_0032_0126_0138_0066.jpg |  1 |  0 |  0 |
 + -------------------------------- + ------ + ------ + - ---- +
 |  0002_0088_0079_0188_0091.jpg |  1 |  0 |  1 |
 + -------------------------------- + ------ + ------ + - ---- +
 |  0003_0059_0170_0127_0061.jpg |  0 |  1 |  0 |
 + -------------------------------- + ------ + ------ + - ---- +
 |  0004_0035_0143_0134_0065.jpg |  1 |  0 |  0 |
 + -------------------------------- + ------ + ------ + - ---- +
 .......
 + -------------------------------- + ------ + ------ + - ---- +
 |  Total |  457 |  543 |  570 |
 + ================================ + ====== + ====== + == ==== +
 Number of stages: 7
 Number of weak classifiers: 34
 Total time: 14.114000

First of all, look at the time (the value of "Total time"), which took to process 1000 images. Taking into account that they had to be read from the disk, the time spent on processing one image is a fraction of a second: 14/1000 = 14 ms. Very fast.

Now directly on the results of the classification. “Hits” is the number of objects found; "Missed" is the number of missed; “False” is the number of false positives (i.e., the cascade gave a positive response in the area where there is no object). Overall, this is a bad result. :) More precisely, as an example for this article, it is satisfactory, but for use in real-world tasks, you should be more careful in creating a training set and determining the optimal parameters for training, then it is possible to achieve an efficiency of 95% with a false response rate of 0.001.

Some results of the algorithm:

And here are a couple of examples with false positives:

Conclusion

The described method has rather wide application. It can be successfully combined with other algorithms. For example, the described method can be used to search for an object in an image, and a classical neural network or another method for recognition.

Thank you for your attention, I hope it was interesting.

What to read in addition to these sources:
Expanded Sequence of Disturbances .
Implementing Bubblegrams for Human-Robot Interaction .

This article shows that pattern recognition is not alive by the neural network. Therefore, in continuation of the above thought and in the next article I would like to talk about object recognition using a statistical approach, namely the use of multidimensional statistical characteristics of the image and the Principal Component Analysis (PCA) Method.

Source: https://habr.com/ru/post/67937/

All Articles