Clustering of duplicates in Yandex. Pictures

Today, an interesting video has appeared in the Yandex.Subbotnik club about how Yandex processes images to eliminate duplicates. Alexander Krainov tells: since 2000 he has been involved in projects related to processing media data. In Yandex, he is responsible for projects involving computer “vision”.

About the report

It is easy to find duplicates among thousands of pictures. More difficult - among the millions. And quite difficult - among the billions. The higher the completeness of the algorithm, the more problems. But at the same time, the completeness of duplicate clustering is the basis of the quality of image retrieval.

I think many do not follow this club and it seems to me that after this video there is something to think about.
Anyone who cares - please under the cat.

')
Link to the presentation in pdf format.

Source: https://habr.com/ru/post/143667/

All Articles

Clustering of duplicates in Yandex. Pictures

About the report

More articles: