📜 ⬆️ ⬇️

Cackle Reviews full text search feedback system

Hello! Not so long ago, we implemented a full-text search for the Cackle Reviews review system . It turned out great, now any moderator in a few milliseconds can find his comments of interest on a word or sentence with support for stemming (fuzzy search by part of a word or its word forms). All this works on Sphinx - full-text search system.

There are many articles on Sphinx on the Internet, but, unfortunately, some of them are outdated, some others do not claim to be complete and accurate how to. So in this post we tried to outline all the steps - installation, configuration, indexing and support for the delta index.

feedback system Cackle Reviews with full text search


1. Install Sphinx


As already mentioned, Sphinx is a full-text search system. The choice on Sphinx was not accidental, the fact is that this system is the fastest based on some tests comparing the speed and capabilities of the engines of such systems.
')
Installation documentation is not so good. So, to interact with Sphinx, we need a MySQL client (for working with the Sphinx API) and Sphinx itself. Below is the command to install all the necessary libraries for Linux (the example uses Linux Debian 8 64bit).

#    MySQL  apt-get     /etc/apt/sources.list deb http://repo.mysql.com/apt/debian/ wheezy mysql-5.6 deb-src http://repo.mysql.com/apt/debian/ wheezy mysql-5.6 #   MySQL apt-get install mysql-client unixodbc libpq5 libmysqlclient18 #    Sphinx (    2.2.9) wget http://sphinxsearch.com/files/sphinxsearch_2.2.9-release-1~wheezy_amd64.deb #  dpkg -i sphinxsearch_2.2.9-release-1~wheezy_amd64.deb 


2. Configure Sphinx


If the installation was normal, then the sphinxsearch daemon should appear in Linux, which you can stop for now ( /etc/init.d/sphinxsearch stop ). Next you need to create a directory structure for storing indexes, in our case there will be two - the main and delta index. The main stores all data, delta only for today to speed up indexing.

 mkdir /opt/sphinx mkdir /opt/sphinx/data #   mkdir /opt/sphinx/data/review #   mkdir /opt/sphinx/data/review_delta #  mkdir /opt/sphinx/log/ 

Edit the Sphinx configuration (default /etc/sphinxsearch/sphinx.conf). It is necessary to clarify that all reviews are stored in the review table and the fields are indexed:


 #   ,    ( PostgreSQL) source base { type = pgsql sql_host = 162.198.0.3 sql_user = postgres_login sql_pass = postgres_password sql_db = cackle sql_port = 5179 } #    index base { #  charset_type = utf-8 #        morphology = stem_enru #     2  min_word_len = 2 } #  review     source review : base { #     search_fulltext #      id   review ( id  search_fulltext    ) sql_query_pre = DELETE FROM search_fulltext WHERE type = 'review' sql_query_pre = INSERT INTO search_fulltext SELECT 'review', MAX(id) FROM review #   id, site_id, status, pros, cons, comment   review sql_query = SELECT id, site_id, status, pros, cons, comment FROM review #      sql_attr_uint = site_id #      (,  , , ) sql_attr_uint = status } #  review         review #    /opt/sphinx/data/review index review : base { source = review path = /opt/sphinx/data/review } #     review source review_delta : review { #    ,   sql_query_pre     review #    -   SQL (SELECT 1) sql_query_pre = SELECT 1 #       ,   id,    id   search_fulltext sql_query = SELECT id, site_id, status, pros, cons, comment FROM review WHERE id > (SELECT id FROM search_fulltext WHERE type = 'review') } #   index review_delta : review { source = review_delta path = /opt/sphinx/data/review_delta } #   searchd #    mysql (mysql_version_string = 5.5.21)        searchd { listen = localhost:9306:mysql41 mysql_version_string = 5.5.21 log = /opt/sphinx/log/searchd.log query_log = /opt/sphinx/log/query.log pid_file = /opt/sphinx/log/searchd.pid } 


3. Launch


Everything is ready, you can start the searchd daemon (/etc/init.d/sphinxsearch start) and start indexing review (main index):
indexer --config /etc/sphinxsearch/sphinx.conf review

full text search reviews Cackle Reviews

So, all 730422 reviews were indexed in 84 seconds. Now you can connect to Sphinx via MySQL (mysql: // localhost: 9306) and try searching through SQL commands:

 #  id      1   '',   id,   0  15 SELECT id FROM review WHERE site_id = 1 AND MATCH('') ORDER BY id DESC LIMIT 0,15 #  id      1, 2, 738, 35302   1 (), 3 () SELECT id FROM review WHERE site in (1, 2, 738, 35302) AND status in (1, 3) AND MATCH(' ') ORDER BY id DESC LIMIT 0, 15 

Within a few milliseconds, we get id reviews, which can then be sampled from the main database from the review table and returned to the client.

4. Configure the delta index


As already mentioned, the delta index is needed for quick indexing of small data size. In our case, this is all new reviews accumulated for the current day. For this setting we create 2 Jobs in the crown:

 crontab -e #  5    review_delta (  id    id   search_fulltext) */5 * * * * indexer --rotate --quiet --config /etc/sphinxsearch/sphinx.conf review_delta #    (  )   review_update.sh 0 1 * * * /opt/sphinx/review_update.sh 

The review_update.sh script starts the review_delta indexing, updates the maximum id in the search_fulltext table and shows the result of the review and review_delta indexes.

 PGPASSWORD=postgres_password; export PGPASSWORD; indexer --rotate --quiet --config /etc/sphinxsearch/sphinx.conf review_delta; psql --host 162.198.0.3 --port 5179 --username "postgres_login" -c "UPDATE search_fulltext SET id = (SELECT MAX(id) FROM review) WHERE type = 'review'" "cackle"; indexer --merge review review_delta --rotate --quiet --config /etc/sphinxsearch/sphinx.conf; 

After that, you can add the review_delta table to the SQL search queries and then the sample will be from 2 indices at once - the main and delta.

 #  id      1   '',   id,   0  15 SELECT id FROM review, review_delta WHERE site_id = 1 AND MATCH('') ORDER BY id DESC LIMIT 0,15 #  id      1, 2, 738, 35302   1 (), 3 () SELECT id FROM review, review_delta WHERE site in (1, 2, 738, 35302) AND status in (1, 3) AND MATCH(' ') ORDER BY id DESC LIMIT 0, 15 

This implementation took only a day of work, with the result we created a stable full-text search, which has been working for several months without a single failure. By the way, adding a similar configuration and the job (review is replaced with comment) for the commenting system Cackle Comments , the search for comments was implemented.

Questions are welcome. Thanks for attention. Successes to all!

Source: https://habr.com/ru/post/261829/


All Articles