Search for an object in an image using a perceptual hash
In those days, when I still believed that programming for myself at lunchtime or after work, you can create your own startup, I had one project. And the project required such an algorithm to search for an object in the image, so that it could be quickly trained on a new object, and that it did not consume a lot of computing resources. After reading the articles about perceptual hash ( once an article and two ), I decided, why not use it to limit the number of studied image areas? And he began to build his bike, instead of using signs of Haar. use perceptual hash, as a filter of image areas, so that after the filter there are only those areas where the desired object is most likely to be located. At the end of the article there is a link to C ++ code using the OpenCV library.
What is the result?
In this video, you can see the work of the object search algorithm in the image in action. The blue bounding box is used to highlight an object in a video when video playback is stopped. In this case, you can remember the appearance of the object. The learning algorithm is instant. Also, the size of the blue bounding box affects the maximum and minimum sizes of the object search window in the image. The red bounding box highlights the area where the target object is most likely located. The algorithm works as follows: ')
Prepare the image for processing (create an integral image)
Calculate the perceptual hash of the image
Check the flag of the array element with the number equal to the perceptual hash
If the perceptual hash passes the test, compare the image section with the patches prepared in advance.
At the last point, problems arise, since the perceptual hash limits the image area is not accurate, and it is impossible to simply cut the found image area with the image of the object, with the slightest shift you will get a big difference. It would be more correct to find the key points in the two images and compare them. In addition, the usual search of patches during the comparison of two images slows down the work of the algorithm, and nevertheless the result could be called "realtime".
The panda video shows the performance of the “filter” based on a perceptual hash. The red bounding boxes indicate areas of the image where the hash coincides with the hash of the desired object (or is within a specified Hamming distance). Alas, this filtering method does not do well with small objects in the video. But the algorithm shows itself well when working with large objects, such as searching for faces on the image from a laptop webcam.
Features of using perceptual hash in this algorithm
If we have infinite computer memory large enough so that each image can be represented as an array index, then a comparison of two images could be done using an if condition, and in the array itself one could store a flag denoting that this image belongs to the desired object. .
This approach is quite realistic for a perceptual hash.
Since the image is an array index, there will be no need to spend time searching for the hash in the array, it will suffice to check the condition that this element of the array (hash) belongs to the desired class of objects. Calculate a hash is not time consuming, if the original image is pre-prepared (to obtain an integral image from it).
Typically, a perceptual hash uses an image size of 8 x 8 pixels, or 64 bits of data. However, the 64 bits of data have so many possible combinations that the array will occupy (1.71E11 x cell size ) of memory. Another thing, if you take a picture with a size of 5 x 5 pixels, which will correspond to 2E25 = 33554432 or at least 32 megabytes of memory, which is quite acceptable.
Noise instead of Hamming distance
Since in this algorithm the perceptual hash of the image of the object being searched is the array index, instead of the Hamming distance to assess the perceptual hash compliance with the desired object, you can prefill the array with flags, i.e. add noise to the Hamming distance. This will reduce the number of calculations.