Today, an interesting video has appeared in the
Yandex.Subbotnik club about how Yandex processes images to eliminate duplicates. Alexander Krainov tells: since 2000 he has been involved in projects related to processing media data. In Yandex, he is responsible for projects involving computer “vision”.
About the report
It is easy to find duplicates among thousands of pictures. More difficult - among the millions. And quite difficult - among the billions. The higher the completeness of the algorithm, the more problems. But at the same time, the completeness of duplicate clustering is the basis of the quality of image retrieval.
I think many do not follow this club and it seems to me that after this video there is something to think about.
Anyone who cares - please under the cat.
')
Link to the
presentation in pdf format.