Good afternoon, hrobrozhiteli!
I present to your attention an extension for PostgreSQL that allows you to send search queries to Sphinx from PostgreSQL and receive the results of these queries.
Implementation details and link to the repository under the cat.
Foreword
One large web project required the use of the full text
engine Sphinx . And everything would be fine if it were not required to sort the results of the queries from Sphinx according to the cleverly specified rules inside the database.
')
And then the obvious question arose: how? Here at least two options suggest themselves:
- Make a full-text query in Sphinx, sort the results in the application;
- Use the power of SQL in PostgreSQL to retrieve search query data from Sphinx in tabular form with the possibility of additional processing in the database.
After weighing all the pros and cons of these options, the sphinxlink extension was implemented. The idea of implementation was honestly spied from the
dblink extension.
Expansion functions and implementation details
As in dblink, everything is managed using SQL functions written in C.
The client interaction protocol with Sphinx corresponds to the mysql interaction protocol, which means that we can use the most common mysql-client library to connect to Sphinx.
To connect to the Sphinx server, use the
sphinx_connect () function:
SELECT * FROM sphinx_connect('myconn'); sphinx_connect
Connections are created in the static memory area of the backend as an HTAB structure, so when closing a session, all connections created to Sphinx will be closed.
To view the list of open connections with Sphinx and connection parameters, use the
sphinx_connections () function:
SELECT * FROM sphinx_connections(); conname | host | port
And now how to send search queries and get results.
Retrieving search query results in tabular form is implemented in the function
sphinx_query () . It is important to specify the name of the columns and their data types for the correct formation of the result.
For example, create an index in Sphinx with the following structure:
DESC myindex; +
The
content field will contain the texts of the documents (for example, the stories of A.P. Chekhov), the
title field is the title of the story, the field
id is the text identifier (the service field in Sphinx).
SELECT * FROM sphinx_query('myconn', 'SELECT weight(), * FROM myindex WHERE MATCH('''')') AS (weight integer, docid integer, title text); weight | docid | title
To request meta-information on the last request, use the
sphinx_meta () function:
SELECT * FROM sphinx_meta('myconn'); varname | value
This feature is made for convenience. A similar effect can be achieved by sending a
SHOW META request using the
sphinx_query () function.
To close an unnecessary connection, use the
sphinx_disconnect () function:
SELECT * FROM sphinx_disconnect('myconn'); sphinx_disconnect
Implementation plans
The current implementation of the module has several disadvantages:
- Connection parameters are not saved when the PostgreSQL session is finished - I would like to have an operation mode in which the connection parameters were saved to a real table in order to get them;
- Everything described above is valid for requests to read Sphinx indexes, but there is no function to manage RT indexes;
- Your suggestions?
Traditionally for similar posts, leave a link to github:
sphinxlink .
Comments and suggestions are warmly welcome. Thanks for attention!