"It looks like." How perceptual hash works

Over the past few months, several people have asked me how TinEye works and how the search for similar images works in principle.

Truth be told, I don’t know how the TinEye search engine works. It does not reveal the details of the algorithm used (s). But looking at the search results, I can conclude about the work of some form of perceptual hash algorithm.

This is perception!

Perceptual hash algorithms describe a class of functions for generating comparable hashes. Image characteristics are used to generate an individual (but not unique) print, and these footprints can be compared with each other.

Perceptual hashes are a different concept compared to cryptographic hash functions like MD5 and SHA1. In cryptography, every hash is random. The data used to generate the hash serves as a source of random numbers, so that the same data will give the same result, and different data will give a different result. From the comparison of two SHA1 hashes, in fact, only two conclusions can be drawn. If the hashes are different, then the data is different. If the hashes are the same, then the data is most likely the same (since there is a possibility of collisions, the same hashes do not guarantee the coincidence of the data). In contrast, perceptual hashes can be compared with each other and conclude about the degree of difference between the two data sets.
')
All perceptual hash calculation algorithms that I saw have the same basic properties: pictures can be changed in size, change the aspect ratio and even slightly change the color characteristics (brightness, contrast, etc.), but they still match the hash. The same properties are shown by TinEye (but they do more, I will tell about it below).

Looks good

How to get perceptual hash? There are several common algorithms for this, and they are all fairly simple (I was always surprised how the most well-known algorithms give effect if they are so simple!). One of the simplest hash functions displays the average of low frequencies.

In images, high frequencies provide detail, while low frequencies show structure. A large, detailed photograph contains many high frequencies. In a very small picture there are no details, so that it consists entirely of low frequencies. To demonstrate how a hash function works in terms of average, I’ll take a photo of my next wife, Alison Hannigan.

1. Reduce the size. The fastest way to get rid of high frequencies is to reduce the image. In this case, we reduce it to 8x8, so the total number of pixels is 64. You can not care about the proportions, just drive it into a square of eight to eight. Thus, the hash will correspond to all variants of the image, regardless of the size and aspect ratio.

2. Remove color. A small image is converted to grayscale, so the hash is reduced threefold: from 64 pixels (64 red values, 64 green and 64 blue) to just 64 color values.

3. Find the mean. Calculate the average for all 64 colors.

4. A chain of bits. This is the most fun: for each color you get 1 or 0 depending on whether it is more or less than the average.

5. Build a hash. Translate 64 individual bits into one 64-bit value. The order does not matter if it is kept constant (I write the bits from left to right, top to bottom).

= 8f373714acfcf4d0

The final hash will not change if the image is scaled, compressed or stretched. Changing the brightness or contrast, or even manipulating the colors will also not greatly affect the result. And most importantly: this algorithm is very fast!

If you need to compare two pictures, then just build a hash for each of them and count the number of different bits (this is Hamming distance ). Zero distance means that these are most likely identical pictures (or variations of the same image). Distance 5 means that the pictures are somewhat different, but on the whole, they are still pretty close to each other. If the distance is 10 or more, then these are probably completely different images.

Try pHash

The average hash is simple and fast, but it fails. For example, you may not find a duplicate image after gamma correction or a change in the color histogram. This is explained by the fact that the colors change on a non-linear scale, so that the position of the “middle” is shifted and many bits change their values.

The pHash hash function uses a more robust algorithm (actually, I use my own version of this algorithm, but the concept is the same). The pHash method is to bring the idea of averages to the extreme by applying a discrete cosine transform (DCT) to eliminate high frequencies.

1. Reduce the size. As with the average hash, pHash works on a small image size. However, here the image is larger than 8x8, here is a 32x32 good size. In fact, this is done for the sake of simplifying DCT, and not to eliminate high frequencies.

2. Remove color. Similarly, the color channels are removed to simplify further calculations.

3. Run the discrete cosine transform. DCT breaks a picture into a set of frequencies and vectors. While the JPEG algorithm runs DCT on 8x8 blocks, in this algorithm DCT runs on 32x32.

4. Shorten DCT. This is a magical step. While the original block was 32x32, save only the left upper block 8x8. It contains the lowest frequencies of the image.

5. Calculate the mean. As in the hash of the average, the average value of the DCT is calculated here, it is calculated on the 8x8 block and the very first coefficient must be excluded from the calculation in order to remove the empty information from the description of the hash, for example, the same colors.

6. More cut DCT. Also a magical step. Assign each of the 64 DCT values to 0 or 1, depending on whether it is more or less than the average. This option will already withstand gamma correction or histogram change without problems.

7. Build a hash. 64 bits are converted to a 64-bit value, here too the order does not matter. To see what the fingerprint looks like visually, you can assign values of +255 and -255 to each pixel, depending on whether there is 1 or 0, and convert DCT 32x32 (with zeros for high frequencies) back to a 32x32 image.

= 8a0303769b3ec8cd

At first glance, the picture seems meaningless set of spherical outlines, but if you look closely, you can see that the dark areas correspond to the same areas in the photo (hair and stripe in the background on the right side of the photo.

As in the average hash, the pHash values can be compared with each other using the same Hamming distance algorithm (the value of each bit is compared and the number of differences is calculated).

Best in class?

Since I am thoroughly engaged in the examination of digital photos and work with huge archives of images, I need a tool to search for identical pictures. So I made a search engine that uses several perceptual hash algorithms. As my rich, though not scientific, experience shows, the average hash function works much faster than pHash and is great if you are looking for something specific. For example, if I have a tiny photo icon and I know for sure that somewhere in the collection there is a full-size original, then the average hash will quickly find it. However, if some image manipulations were carried out with the image, for example, they added text or glued a new face, then it is unlikely to cope, while pHash, although slower, is much more tolerant of minor image modifications (less than 25% of the area).

And again, if you keep a service like TinEye, you won’t run pHash every time. I assume they have a database with pre-calculated hash values. The main function of image comparison works extremely fast (there are some ways to greatly optimize the Hamming distance calculation). So, hash calculation is a one-time task, and it is quite realistic to perform millions of comparison operations in a couple of seconds (on one computer).

Varieties

There are variations of the perceptual hash algorithm that also increase speed. For example, an image can be cropped before reducing its size. In this case, the loss of information around the main part of the image does not play a special role. In addition, the picture can be divided into parts. For example, if you have a face recognition algorithm, you can calculate a hash for each face (I suspect that the TinEye algorithm does something similar).

Other varieties can analyze the overall color distribution (for example, her hair is more red than blue or green, and the background is more light than dark) or relative to the relative positions of the lines.

If you can compare pictures, then completely new perspectives open up before you. For example, the GazoPa search engine allows you to draw a picture on the screen. As is the case with TinEye, I do not know all the details of their algorithm, but it looks like some kind of perceptual hash. And since the hash function reduces everything to low frequencies, my curve drawing of three figures with stick handles can be compared with other images, including photos, which depict three people.

Source: https://habr.com/ru/post/120562/

All Articles