How to find a girl in 250 microseconds

In contrast to Europe and America, cautious attitude prevails in dating sites. However, the hope of clicking on the magic button and finding love does not go out in the hearts of many. And we have to justify this hope. Of course, we do not promise to immediately find the ideally suitable “half”, but it is simply obliged to offer dozens, hundreds or, in some cases, thousands of options that meet your exact requirements. What we are doing, and very quickly.

The average database search of 11 million questionnaires with 4 to 30 parameters each takes us an average of 3.5 milliseconds. And at the same time, besides the search for the demon servcher of the “Mamba”, it performs the following, including not quite traditional tasks:

for each particular questionnaire gives her place in the search (each user, entering her profile, sees the message "You are on the N place in the search")
issues a specific profile from the list on the primary key
performs a direct search for the questionnaire according to the specified parameters

In spite of the fact that our search was developed from the very beginning on its own, from time to time there were thoughts to use something already known, run-in and guaranteed to be effective. Well, and if we think about searching, Sphinx comes to mind first. At some point we decided to check whether he could give us significant advantages, and were somewhat surprised at the results obtained. On a test base of 2 million Sphinx questionnaires showed an average result of 100-120 ms in, depending on the request, while we (only because it was a test) included only a part of the fields in the index. Our search on the same base and on the same equipment spends from 0.2 to 1.1 ms per request.

Sphinx: source - mysql, int sql_attr_uint. .

As further practice showed, half the success of the search lies in creating the right index. In our understanding, the correct index is the one that best fits the search tasks and provides the highest processing speed. After a general analysis of the base, it was decided to divide the index into several parts:

questionnaires located in Moscow and the region with photographs (this is the most frequently requested type of questionnaire)
VIP users index (VIP users have certain advantages, including in the search results, therefore they are processed separately)
full index of all types of questionnaires for all fields of each questionnaire

But the main tricks are hidden in the structure of each part of the general index, which are constructed as follows:

questionnaires are divided into blocks of 256 profiles in the block
each block is put in the index: first the first bit of the first field of each form, then the second bit of each form, etc. Thus, the data of one bit from the field of all the forms go in a row

What does this mean for search? For example, we have an index for the fields “gender” and “age”, where gender can take the values “M” and “F”, and age - from 18 to 21. Imagine that a user wants to find a girl of 20 years old (i.e. conditions satisfy only one value of each field).
')
Suppose we have such questionnaires (1 block is shown, since the minimum allowable age is 18, then 18 years old is coded as 0000, 19 as 0010, 20 as 0100, and so on)

After processing, the following index is obtained (the first bit is selected in bold, before and after):

For some fields, the value is kept non-compact. For example, the present age field takes 64 values. You can store 1 bit for a possible value, that is, the field will take 64 bits, but the search will take only 1 operation, and a search by range - 2 operations. A more traditional option is to store it as a number, then it will take log2 (64) = 6 bits, but a search for a specific age will take 6 operations, and a search over a range of more than 12 (the exact value depends on the length of the comparison condition recorded as DNF).

The search process itself is as follows:

daemon accepts and parses request
memory is allocated for the results (by bit for each profile in the index)
the memory segment is allocated for the compiled request, the execution flag is set
this segment is written:
- initialization code, loop header
- For each field for which you want to filter the result, a code is written that checks the value of this field.
- end of the cycle, checking the boundaries, counting the number of found
this segment is launched through the usual long jump

The most interesting is the field verification code. In the results segment, each questionnaire corresponds to 1 bit, which indicates whether the questionnaire is suitable for the query or not.

Processing the request is as follows:

After that, it remains only to go through the bitmap with the result and see which forms have 1.

In our example, the result will be 00010, that is, only the questionnaire number 4 satisfies the query.

The “Mamba” search daemon is invoked from a significant portion of all pages on the site. I think it is worth noting that, along with several other “main” demons, it uses the JSON-RPC protocol and generally creates a “single demonic space”.

The real search statistics is as follows:

Requests to search by profile id

Fig. 1 Requests for search by profile id (schedule from one server)

Fig. 2 Requests for search by parameters (schedule from one server)

In total, the search runs on two machines, the peak performance of one is 20k of searching for profiles using parameters with a full calculation of the number of profiles found per second (this is the longest operation, the others work much faster). The workload is about 800 rps of search + 1000 rps to get a place + 1500 rps to get personal data.

Finding the place of the questionnaire in the search results

Finding the place of the questionnaire in the search results

Fig. 3 Displaying the place of the questionnaire in the search results (schedule from one server)

Distribution of the total response time (i.e., taking into account network interaction) from all (two) search servers. The average is 2.5 ms, the median is 1.5 ms, 5% of requests take more than 10 ms, and 1% of requests take more than 15 ms.

Source: https://habr.com/ru/post/144084/

All Articles

How to find a girl in 250 microseconds

More articles: