Taming SphinxSearch with an Elephant

Good afternoon, hrobrozhiteli!

I present to your attention an extension for PostgreSQL that allows you to send search queries to Sphinx from PostgreSQL and receive the results of these queries.

Implementation details and link to the repository under the cat.

Foreword

One large web project required the use of the full text engine Sphinx . And everything would be fine if it were not required to sort the results of the queries from Sphinx according to the cleverly specified rules inside the database.
')
And then the obvious question arose: how? Here at least two options suggest themselves:

Make a full-text query in Sphinx, sort the results in the application;
Use the power of SQL in PostgreSQL to retrieve search query data from Sphinx in tabular form with the possibility of additional processing in the database.

After weighing all the pros and cons of these options, the sphinxlink extension was implemented. The idea of implementation was honestly spied from the dblink extension.

Expansion functions and implementation details

As in dblink, everything is managed using SQL functions written in C.
The client interaction protocol with Sphinx corresponds to the mysql interaction protocol, which means that we can use the most common mysql-client library to connect to Sphinx.

To connect to the Sphinx server, use the sphinx_connect () function:

SELECT * FROM sphinx_connect('myconn'); sphinx_connect ---------------- OK (1 )

Connections are created in the static memory area of the backend as an HTAB structure, so when closing a session, all connections created to Sphinx will be closed.

To view the list of open connections with Sphinx and connection parameters, use the sphinx_connections () function:

 SELECT * FROM sphinx_connections(); conname | host | port ---------+-----------+------ myconn | 127.0.0.1 | 9306 (1 )

And now how to send search queries and get results.

Retrieving search query results in tabular form is implemented in the function sphinx_query () . It is important to specify the name of the columns and their data types for the correct formation of the result.

For example, create an index in Sphinx with the following structure:

 DESC myindex; +---------+--------+ | Field | Type | +---------+--------+ | id | bigint | | content | field | | title | string | +---------+--------+

The content field will contain the texts of the documents (for example, the stories of A.P. Chekhov), the title field is the title of the story, the field id is the text identifier (the service field in Sphinx).

 SELECT * FROM sphinx_query('myconn', 'SELECT weight(), * FROM myindex WHERE MATCH('''')') AS (weight integer, docid integer, title text); weight | docid | title --------+-------+------- 1680 | 3 |  (1 )

To request meta-information on the last request, use the sphinx_meta () function:

 SELECT * FROM sphinx_meta('myconn'); varname | value -------------+-------- total | 1 total_found | 1 time | 0.000 keyword[0] |  docs[0] | 1 hits[0] | 1 (6 )

This feature is made for convenience. A similar effect can be achieved by sending a SHOW META request using the sphinx_query () function.

To close an unnecessary connection, use the sphinx_disconnect () function:

 SELECT * FROM sphinx_disconnect('myconn'); sphinx_disconnect ------------------- OK (1 )

Implementation plans

The current implementation of the module has several disadvantages:

Connection parameters are not saved when the PostgreSQL session is finished - I would like to have an operation mode in which the connection parameters were saved to a real table in order to get them;
Everything described above is valid for requests to read Sphinx indexes, but there is no function to manage RT indexes;
Your suggestions?

Traditionally for similar posts, leave a link to github: sphinxlink .

Comments and suggestions are warmly welcome. Thanks for attention!

Source: https://habr.com/ru/post/347700/

All Articles

Taming SphinxSearch with an Elephant

Foreword

Expansion functions and implementation details

Implementation plans

More articles: