📜 ⬆️ ⬇️

Sphinx - not just for searching!


I think about such a wonderful search engine Sphinx heard all or almost all. Surely many people already use it to search the site, to select similar articles, news, products, etc. It does an excellent job with search tasks even with very large numbers of entries. But, in my article I want to tell you about using Sphinx not for searching.

It so happened that I continued to develop the site news aggregator. When I started to do it, he “kept” 40 - 60 thousand news in the database and braked terribly. Rewriting everything from scratch, transferring part of the site to statics and entering caching increased the number of news that the site coped with to approximately 500 thousand. Of course, I used Sphinx to search for news, because I heard many enthusiastic reviews about it from my colleagues. I must say he did not disappoint me.

And so, on one not so wonderful day, the management set the following task: to show one last news from each source. In SQL terms, this is sorting by date and grouping by source. Attempting to implement an appropriate sample from the database led to the fact that the site did not cope with the load.

It turned out that MySQL first performs grouping (it takes the first entry in the table) and only then sorts the selected records. You can solve the “sort before group” problem either by using a subquery, or by using the aggregate function MAX (). And this is the use of a temporary table and very slow queries ...
')
An alternative solution was to use the Sphinx features for grouping results . After reading the documentation, I decided to try to replace MySQL in the task of selecting news for display on the site on Sphinx.

For this, we had to set the sorting by date, grouping by the numeric field ID of the Source, and as a request - the name of the news category. The request is as follows:
$sphinx = new SphinxClient();
$sphinx->SetServer('localhost', 3312);

$sphinx->SetMatchMode(SPH_MATCH_EXTENDED2);
$sphinx->SetSortMode(SPH_SORT_ATTR_DESC, 'date');
$sphinx->SetLimits($from,$count);
$sphinx->SetGroupBy('site_id', SPH_GROUPBY_ATTR, 'date desc');
$category = "@category_path {$cat['category_path']}";
$result = $sphinx->Query($category, 'news news_delta');


The result exceeded all my expectations! I can’t give exact query execution times using MySQL and Sphinx, because I forgot to write them down with joy. But I will say that now we have more than two million news in our database and everything works quite fast.

To maintain the index up to date, a delta index is used, updated every 10 minutes, and the main index, updated daily. MySQL unloading helped to avoid table locks during long queries, MySQL is used only to fetch data by ID, which happens very quickly.

Source: https://habr.com/ru/post/64325/


All Articles