📜 ⬆️ ⬇️

Sphinx - Distributed Search. Running REPLACE for the distributed index

The article is aimed at those who already know what Sphinx and SphinxQL are.
Objective: To ensure the continuity of the site search with the help of Sphinx at the time of the technical work on one of the Sphinx nodes of the cluster.

Sphinx is an excellent tool for organizing site searches. In the project in which I participate, the search for ads takes place using Sphinx. The ads are stored in the database in the EAV model and the search for them is performed by Sphinx and then the ads are retrieved by the identifiers found in Sphinx. Thus, if Sphinx stops working, it will affect the whole site.

For work, rt sphinx indices are used to instantly make changes to the search results if an ad is edited or banned. As long as it worked on one node, everything was fine until there was no need to make changes to the structure of the indices themselves. To change the list of attributes in the search index, it was necessary to edit the configuration, restart the sphinx and re-index the ads. In order to do this without stopping the site, it was decided to build a cluster with one main node actually performing the role of a balancer and two child nodes containing an index and being mirrored among themselves.

Setting up the indexer and searchd sections is generally normal
indexer { } searchd { listen = 127.0.0.1:3301 #   Sphinx Api listen = 127.0.0.1:3309:mysql41 #   SphinxQL log = ./sphinx-log-searchd.log query_log = ./sphinx-log-query.log pid_file = ./sphinx-log-searchd.pid binlog_path = ./sphinx-binlog read_timeout = 5 max_children = 30 max_matches = 1000 seamless_rotate = 1 preopen_indexes = 0 unlink_old = 1 workers = threads } 


To organize a search cluster in Sphinx there are distributed indexes.
On the main node, all indexes have the following form.
 index distributed_section_1 { type = distributed agent = 127.0.0.1:9301:rt_section_1|127.0.0.1:9302:rt_section_1 ha_strategy = nodeads } 

By the way, there is a difference between how to describe the child nodes, in the previous example, they are described as mirrors, and in the following they are described as nodes that store two different parts of the same index. The difference is that in the first case, the select request is sent to one of the nodes, and in the second example, the select is sent to all the nodes and the search result from each of the nodes is combined.
 index distributed_section_1 { type = distributed agent = 127.0.0.1:9301:rt_section_1 agent = 127.0.0.1:9302:rt_section_1 ha_strategy = nodeads #     . nodeads -       } 

On child nodes, indices are described as the most common real time indices in Sphinx:
 index rt_section_1 { type = rt mlock = 1 morphology = stem_en, stem_ru min_word_len = 3 min_infix_len = 1 index_exact_words = 1 dict = keywords path = ./notices_rt_section_1 rt_field = title rt_field = text rt_attr_uint = date rt_attr_uint = active rt_attr_multi = location } 

Everything with sampling on the cluster is no problem.
 mysql> select * from rt_section_1; +---------+------------+--------+----------+ | id | date | active | location | +---------+------------+--------+----------+ | 185191 | 1398749772 | 1 | 145430 | | 185234 | 1398749771 | 1 | 145425 | +---------+------------+--------+----------+ 2 rows in set (0.03 sec) 

It already seemed to me that the problem was solved on this, but it was not the same. The sample works, and what about the REPLACE or INSERT requests?
As it turned out, there was an ambush here - REPLACE and INSERT by default work only on local indexes, and I use distributed.
')
But it does not matter. Since the sphinx project is open source, I made my own assembly that allows you to perform REPLACE queries on distributed indexes.
To build it you need to download the source and run the command
 cmake . && make 

Now, running this build with exactly the same settings as before, the query will be executed on all the mirror nodes.
 mysql> REPLACE INTO rt_section_130054 (id, `location`, `title`, `text`, `active`, `date`) VALUES ( 2435558, ( 145411 ) , ' ', ' ', 1, '1399529047'); Query OK, 2 rows affected (0.04 sec) 

I used two mirrors for testing, and we see that 2 records are affected, that is, one record on each node.

Source: https://habr.com/ru/post/222203/


All Articles