Own search algorithm for similar images. Theory

Recently, in connection with the development of a new product line, our company faced the task of searching for identical images in the database.

Outsourcing sales is too expensive and does not guarantee the best solution. It’s cheaper to give to a freelancer - cheaper, but the solution will most likely be just as cheap and based on existing libraries, such as OpenCV. But if the task was solved so simply, then competitors would have long ago taken advantage of it and made a decent product, but it is not on the market. In general, perfectionism inherent in us, ambitiousness and the desire to be the best, do not allow us to bring to the market a product “like everyone else”, we need better, faster, stronger. They made a decision to independently sort out the issue, work out a solution, write a detailed technical task and give it to the freelancer. It was hoped that there were ready-made solutions that competitors simply did not notice. But having studied the question (and with it the algorithms ORB, BRIEF, FAST, SIFT, SURF, BRISK, A-KAZE, Viola-Jones and several more), it became clear that all these algorithms have their drawbacks. Although for the solution of our problem, some of the above algorithms were suitable, but somehow I suddenly wanted the uniqueness and simplicity of the solution. And here I submit to the community, the algorithm of his own composition.

Fans criticize (constructively) please under the cat.

1. The principle of the algorithm

Initially, and this is important, a method of comparing images was chosen based on searching and comparing so-called. key points.
')
Like any (or almost any) algorithm of this kind, mine consists of the following steps.

Step 1. Search for key points in the image.
Step 2. Forming a descriptor.
Step 3. Formation of a binary string (hash).
Step 4. Comparison of the hashes of the image being processed with the image hashes stored in the database.
Step 5. Getting the result in the required form.

2. Description of the algorithm

2.1. Search for key points in the image.

The key (singular point, or feature) is an image point that satisfies a number of properties:

Distinctiveness (distinctness) - the feature should stand out from the background among neighboring points.
Resistance (repeatability) - a change in brightness, contrast and color gamut should not affect the place of a particular point on an object or scene.
Stability (stability) - image noise, not exceeding a certain threshold, should not affect the operation of the detector.
Interpretability (interpretability) - special points should be presented in a format suitable for further work.
Quantity (quantity) - the number of detected singular points must provide the required quantity for the detection of objects.

To search for specific points, I use “FAST Detector” as the best in terms of performance / quality ratio, although other detectors can be used for other tasks.

Next, using the Harris Angle Detector, we select a small number (I have 50) of the most significant key points.

* With the principles of operation of the detectors FAST and Harris, can be found in the interesting lightsource article " Angle Detectors ".

2.2. Descriptor generation

Descriptor (from the Latin. Descriptor - describing) - the description of a particular point, which determines the features of its neighborhood, is a numeric or binary vector of certain parameters. The length of the vector and the type of parameters are determined by the algorithm used. The descriptor allows you to select a particular point from the whole set of them in the image; this is necessary for composing key pairs of features belonging to the same object when comparing different images.

In this step, I propose to deviate methods for describing key points that are commonly used in such algorithms and to turn to simpler methods.

I suggest, instead of describing each key point through its environment, measure the distance from each such point to the remaining points. Those. lay all possible segments between the points found and measure their lengths.

So, this is the result of the measurement, the number (length of the segment), I propose to call it a descriptor.

2.3. Hash generation

Hashing or hashing (English hashing) - the transformation of an input array of arbitrary length into a (output) bit string of a fixed length, performed by a specific algorithm. The function that implements the algorithm and performs the conversion is called the “hash function” or “convolution function”. The source data is called an input array, a “key” or a “message”. The result of the conversion (output) is called “hash”, “hash code”, “hash sum”, “message summary”.

As is known, X * (X-1) / 2 segments can be laid through X points, i.e. from 50 points you can make 50 * 49/2 = 1225 segments. Those. our string will consist of 1225 integers. Moreover, these numbers must have the same number of characters. The required number of characters is selected based on the image resolution (for example, for an image of 500 * 500, the maximum length of the cut 707 (diagonally), which means there should be 3 characters).

To achieve invariance to rotation in a plane, we write the lengths of the segments into a line not from the extent of their calculation, but from the largest to the smallest.

Get the string: A1, A2, ..., A1225 , where A1> A2> ...> A1225

2.4. Hash versus DB

To search for such images in the database, you need to compare the hash A1, A2, ..., A1225 that we have with those stored in the database.

For example, compare the hash A1, A2, ..., A1225 , where A1> A2> ...> A1225 with B1, B2, ..., B1225 and let descriptors A be three-valued and descriptors B four. This can be in two cases, if the images are different or if one image is at different scales.

To achieve scale invariance, we will bring each descriptor to a single scale. To do this, we determine the maximum number M of the same bitness as the capacity of the larger descriptor (for our example, the maximum four-digit number M = 9999 ) and find the coefficient. scaling K by the formula: Ka = M / A1 (for the line A1, A2, ..., A1225 ) and Kv = M / B1 (for the line B1, B2, ..., B1225 ). Next we bring the hash to a single scale, for this we multiply each descriptor of hash A by Ka , and each descriptor of hash B by Ap . We write the results in the lines A'1, A'2, ..., A'1225 and B'1, B'2, ..., B'1225 , rounding the values to an integer. As a result, we get 2 hashes of the same scale and one dimension.

Now the lines '1, '2, ..., '1225 and '1, '2, ..., '1225 can be compared. For this, we use the modified Hamming distance formula.

The fact is, the Hamming distance takes into account the number of errors, as the number of not equal values in the rows. But in the previous step, we applied rounding to a whole number, which could lead to false results. Therefore, we will not consider an error if the values of the corresponding descriptors of the strings A and B differ by E (i.e., the difference AI and Bi taken in absolute value must be greater than E , in order to consider the values in the rows not equal). Let's call E "coefficient loyalty "and in the General case, we take E = 1 .

Further by simple summation we count the number of errors. Get the distance P.

2.5. Getting results

For different tasks, you may need a different presentation of the results of the algorithm.

2.5.1. Assessment of similarity 2x image

The result will be shown calculated in step 2.4. distance 2x hashes r. Or their estimated value. For example, when P <10, we can assume that the images are identical; at 10 <P <100 , the images are similar; etc.

2.5.2. Image search in the database as close as possible to the studied

The result will be the output of an image (or images) with the smallest distance P. Or the output of the message that no images like the one we have found was found, if all the distances P are greater than a certain threshold.

3. Advantages and disadvantages of the algorithm

3.1. The advantages of the above algorithm include:

Simplicity
High speed work
Flexibility (you can use different methods for finding key points, depending on the task, image characteristics and the required accuracy).
The ability to use for images of different quality, thanks to the coefficient. loyalty e .
Invariance to rotate and scale the image.

3.2. The disadvantages I would mention:

Lack of invariance to distortion of a perspective character. Although to some extent this can be achieved if you divide the original image into small fragments and increase the coefficient. E.
It is not possible to search for a piece of the image in a large image.

* I am pleased to take note of your comments, indicating the shortcomings of the algorithm in order to refine it.

3.3. Uncertainty

Not being a programmer, I do not have the opportunity to independently check the algorithm in operation. I will be glad if someone from the Habra community can programmatically implement this algorithm and test it in work.

After such a check, we can relate the most important parameter “accuracy of operation” to the advantages or disadvantages of my algorithm.

References:

1. " Angle detectors " - lightsource
2. " Comparative analysis of descriptors of particular points of images with the introduction of algorithms under the Android operating system - Patin MV
3. " Caching ", wikipedia.org

PS The second (improved) version of the algorithm: habrahabr.ru/post/320994

Source: https://habr.com/ru/post/320720/

All Articles