Picture Factory - how does it work? Part 2

Finally I was going to write the second part as promised in the first . In this part I want to talk about the client side of the project.

What is used:

As said before the project is completely written in Python (with Cython inserts). All information about images, users, statistics - is stored in the MySQL database.

Sphinx server is used to search (main) and filter. Client written for twisted txsphinx .
')
For likes, number of image views and number of downloads, Redis is used. Also in Redis top images (home page) and “similar images” (page of the image itself) are stored. For twisted client txredis, found in the open spaces and slightly modified by itself (not yet in public).

Web: TwistedWeb with the Jinja2 template engine, all Bootsrap and Jquery are drawn. The end of the chain is Nginx.

The interesting part:

The first (and most interesting) was to make an image filter . To begin with, a list of search fields was compiled:

Categories
Minimum image resolution
Keywords
Colors

It was decided to make the filter using Sphinx. Indexing occurs via xmlpipe. The definition in sphinx is very simple:

source images { type = xmlpipe2 xmlpipe_command = bin/sphinx.py --indexer=images } index images { source = images path = /var/lib/sphinx/data/images morphology = stem_enru charset_type = utf-8 min_word_len = 2 min_infix_len = 3 enable_star = 1 docinfo = extern html_strip = 1 index_exact_words = 1 expand_keywords = 0 wordforms = images_wordforms.txt }

Categories: MVA attribute, list ID. Also, the text attribute is a list of category names (for proper search, adding weight to the results).

Minimum image resolution: Two attributes width and height . It’s also simple, search by the range of each attribute, from user-defined to maximum (magic number 10,000).

Keywords: Three text attribute title tags keywords . Title - the title of the image, the results are given the maximum weight when hit. Tags - image tag list, average weight. Keywords - a set of keywords (the user does not see them), taken on the image page, may contain garbage. Little weight.

Colors: It was the most difficult, I will tell in more detail. A color palette {ID => RGB} was created. When adding an image to the database, we get a list of the dominant colors and equate them to our palette. The colors of the image are stored in the database with two values: the color ID and the percentage occupied in the image. The index contains ten MVA-attributes “c_X” where X is a number from 0 to 9. All colors of the image fall into c_0, colors with percent> = 10 in c_1, colors with percent> = 20 in c_2, etc.

Filter by color: When searching for images by color, all images whose color is in the index c_1 are taken, then the weight of the color is considered. When searching by color with ID 2 (pseudocode):

 setSelect('(IN(c_1,2)*1) + (IN(c_2,2)*1) + (IN(c_3,2)*1) + (IN(c_4,2)*1) + (IN(c_5,2)*1) + (IN(c_6,2)*1) + (IN(c_7,2)*1) + (IN(c_8,2)*1) + (IN(c_9,2)*1) AS colors_weight') setOrder('colors_weight DESC')

Perhaps the search for colors made not the most optimal way, but this is the most successful of what I came up with.

Total:

The filter speed makes me happy, now it's about 50-80 milliseconds with 70,000 images. If something else is interesting on the project, please ask, I will be happy to tell. Again, the project itself: http://picsfab.com

Source: https://habr.com/ru/post/188034/

All Articles

Picture Factory - how does it work? Part 2

What is used:

The interesting part:

Total:

More articles: