📜 ⬆️ ⬇️

Sphinx - not just for searching! My history

Hi% username%!

Today I want to tell how we (I and my partner) decided to abandon MySQL in favor of Sphinx for complex samples and sorts.

How was the deal

It all started from that (as you have already guessed) that MySQL was slowly and surely gaining momentum and loading the server more and more. And the culprit was one table - torrents (yeah, this is a tracker). The time has come and we started to think how to solve the situation. The bottleneck we knew where, and they were filtering and sorting hands. Caching disappeared immediately, the number of combinations of samples was very large, moreover, the actual data was always needed (the number of seeders, leechers, downloaded ...). We didn’t think about replication either, since already two servers were working in master-master mode, plus we didn’t want to build trees from replicas. Our first thought was partitioning, but it gave nothing but brakes. The second thought was to set the table on a separate server, but even here a bummer was waiting for us ... At that time, we had the XBT Tracker spinning. The problem was that he knew how to work only with tables located in the same database. “So what?” - we thought, you can rewrite, because it is open source. Agreeing that the idea is realizable and has the right to life, it was postponed to the side, because neither he nor I were friends with C ++ and she bears costs in the form of financial expenses and rewriting the site's engine.
Almost desperate to find a solution, walking around the Internet, I came across a post - Sphinx - not only for searching! . Then I realized this is what we need! Sphinx worked for us for a long time as a search engine and we were very pleased with it. Rolling up my sleeves, my partner took up the update and setup of the Sphinx, and I for rewriting the method of filtering hands. Then, after seeing the post How to Prepare SphinxQL , I decided to use SphinxQL.

In the case of Sphinx, we got one index, which is used both for searching and filtering hands. Full re-indexing of which occurs every 5 minutes (because you always need up-to-date data). Reindexing is carried out in less than 1 minute.

In the case of PHP, it was just as easy. The query generator was almost not able to redo it. The requests themselves were from harmless

SELECT *, leechers+seeders AS peers FROM torrents WHERE parent_cid=1 ORDER BY ctime DESC LIMIT 0,30 OPTION max_matches=30

to such that forced MySQL hardly moves

SELECT *, leechers+seeders AS peers FROM torrents WHERE parent_cid=1 ORDER BY ctime DESC LIMIT 40380,30 OPTION max_matches=40410

SELECT *, leechers+seeders AS peers FROM torrents WHERE MATCH('@quality DVDRip') AND cid=6 AND year=2009 ORDER BY peers DESC LIMIT 360,30 OPTION max_matches=390

After referring to the sphinx, we fill the array with information about the distribution. Since Sphinx does not return string values ​​(for example, the name of the distribution), you need to fill another array with the distribution cache keys (one distribution - one cache), and make one multi-request to memcache to get all the information we need. Well, at the end run through the loop and put everything together to give to the user.


We were very pleased with the result. What did we get in the end? The main thing - the load on MySQL dropped almost three times, a cloud of indexes was removed from the torrents table, we don’t pull at all on the MySQL filtering page.

In this form, as I have told, this case has been working to this day for a little over a year. We did not experience any problems due to this decision. We know that the decision is quite a crutch, but it works with a bang. Now Sphinx and memcache live on a separate server with a bunch of RAM and grief do not know.

Thanks for attention.

Source: https://habr.com/ru/post/102749/

All Articles