📜 ⬆️ ⬇️

Fast filter catalog for online stores based on Redis bitmaps



It is no secret that every online store should help users find what they need. Especially if you have a lot of goods (> 10). Cataloging of goods comes to the rescue, but it is half the battle to break up products into categories. Products within the category need to be able to filter by their properties. Especially if you have mixed goods, for example, clothes, electronics, jewelry, etc. And here any developer who writes his e-commerce product faces unpleasant realities of life: products may have completely different properties, some products may be missing, some products may fall under different values ​​for one property (the color of the dress is either blue, either blue, respectively, it would be nice to show it in both blue and blue). Simply put, you have an EAV . It also happens that EAV is diagnosed to you by the customer closer to the end of development, and even asks you to add a filter on dynamic properties after the release.

You start to scratch behind the ear, knowing that nothing fits into the relational model, you have already chosen MySQL as a DBMS, if you are a good web developer, or maybe PostgreSQL, if you read about it, and adherents of different corporations can generally choose their products, no one forbids. Most often, however, these are all RDBMS, and dynamic properties are not very easily screwed in (read: difficult), not everyone can do it.
Here, for example, a small piece of the database diagram in the popular e-commerce platform Magento:

(on click in full size)

So we were tasked to make such a catalog filter for a jewelry store . And the properties of the goods we then generally lay in json-e in MySQL, because needed only on the page of the product itself and nowhere else. Our position was slightly improved by the fact that it was a jewelry online store, and the properties in it could be set initially, such as ring size, metal type, metal color, insert. Nevertheless, the solution obtained is universal; you can easily modify the code under fully dynamic sets of properties for goods.
It was decided that changing half of the database and more than half of the code for adding a filter is not good, so we called in-memory key-value storage called Redis , in particular, its cool ability to work bit by bit with strings, operations: SETBIT, GETBIT, BITOP, BITCOUNT. It is easy to guess the meaning of the commands without looking into the documentation.
')
The filter storage scheme in Redis was as follows:
  1. One key is one value of the product property, for example, size-18: or color-red:
  2. Data in each key was bitmap long N bits, where N is the total amount of goods in the store. Accordingly, the position of the bit in the bitmap is the product ID, and the bit itself indicates whether the given ID belongs to a filter with such a value.

An example for better understanding:
Product ID (bit position)ID: 1ID: 2ID: 3ID: 4ID: 5
redis-keyredis-value
size-17:00one0one
size-18:oneone000
size-19:000one0
color-red:oneoneone0one
color green:00oneoneone

Thus, in radish we have 5 key value pairs, because we have two filters - by color (2 options) and by size (3 options). There are only 5 products in the store, so bitmap consists of 5 bits. The table shows that the product with ID 2 is red, size 18, and the product with ID 3 is size 17, but it has both red and green color.
To apply the filter to the product catalog, it is sufficient to perform a bitwise AND operation for the filter values ​​selected by the user. For example, a person wants a green goods size 18, pokes two jackdaws in the filter, and we do:
BITOP AND result-key size-18 color-green

then in the result-key we will have a bitmap, representing the bitwise multiplication of these two bitmaps. It remains for us only to calculate the places in which we have units, positions of units and will be ID of products with the specified filters:
 $bytes = str_split($result_key); $ids = []; // resulting product IDs for ($i = 0; $i < count($bytes); $i++) // iterate in bytes (8 products at once) for ($j = 7; $j >= 0; $j--) // iterate over bits in current 8-products chunk if ($bytes[$i] & (1 << $j)) // >0 if bit was 1, otherwise 0 $ids[] = 8*$i + (8-$j); // calculate product ID and append it 


The generation of bitmaps occurs at the moment of adding / changing the goods in the admin panel, depending on the available properties of the goods, we make BITSET into the necessary filter, that's all.

Advantages of such a decision:
1) Eats little memory. We have> 50,000 products, about 100 filter values, that is, 50,000 * 100 = 5,000,000 bits = just 625 kilobytes of memory.
2) Very fast. The complexity of the bitwise O (N) operation, however, the strings are not measured in millions of bytes, but multiplying a couple of bitmaps with about 50,000 bits is a task of a couple of microseconds for the processor. Overall, in the worst case (multiplication of all filters), measuring the time difference in PHP before sending the command to REDIS and after receiving the result - 40ms (this is with the additional function from item 3, below). Quite realtime page generation, for the web will go. If it seems a lot, please cache the result, but we are completely satisfied with this.
3) Ability to count the number of products in each filter and category . This has become a useful side effect. We can now calculate the number of products for each filter value that is currently available. Yes, this requires a bitwise multiplication of the current result-key (the current selection of products) by each filter value, and then the execution of BITCOUNT. We realized this, now we can dynamically hide filters with an empty set of goods (a person, having chosen platinum diamond rings, does not see the filter at the price “up to 3,000 rubles”).

cons of such a decision:
1) Inability to code range filters, for example, where the price can be manually filtered by the user up to a ruble. Well, such, with sliders OT and TO, you know. Who still do not work on mobile phones. In our store, the filter at prices was just five options (up to 3000, 3000-10000, etc.), respectively, encoded them as 5 price-0-3000 bitmaps :, price-3000-10000: etc. .
2) The need to transfer the list of selected IDs in MySQL to fetch their data. This, of course, is not good that we throw a list of IDs for sampling from radish.
 SELECT * FROM products WHERE id IN (....) 

But as it turned out, it works extremely fast. In the worst case, the entire catalog page with all the selected filters for all categories was generated in 600ms, if not mistaken. Proof for several filters:


As a result, this business turned out to be very fast, there are Redis bindings for PHP , Redis itself is very primitive and easy to learn in one day.

Source: https://habr.com/ru/post/261137/


All Articles