At some point in the development of the project there was a question of searching for a large number of texts. Moreover, the texts have different lengths: from tweets to large articles. At first, the main search engine was the built-in Postgres _ts vector . To search for simple rules it was quite enough. The array of texts grew at a high speed, and the search rules became more complicated, so the built-in engine did not cover the requirements.
Yes, there is a sphinx, it has excellent integration with Postgres, but the goal was to find a solution to use elasticsearch with Postgres. Why? Elasticsearch showed good results in some project cases. And already there was a server with it for storing logs logs. There was also a desire to find such a tool that will completely take over the synchronization of data.
As a result, the project ZomboDb was found on the open spaces of the network, which just fit the requirements.
Project page on github .
This section is a retelling of the official instructions.
Currently supported package versions:
Package | Versions |
---|---|
Elasticsearch | 1.7.1+ (not 2.0) |
Postgres | 9.3, 9.4, 9.5 |
My configuration: Postgres 9.4, elasticsearch 1.7.5
Add the following line to postgresql.conf :
local_preload_libraries = 'zombodb.so'`
Restart the database and create the extension:
psql db_name -c "CREATE EXTENSION zombodb;"
Next, from the same page you need to download the plugin for elasticsearch and install it:
bin/plugin -i zombodb -u file:///path/to/zombodb-plugin-XXXzip
Add to elasticsearch.yml :
threadpool.bulk.queue_size: 1024 threadpool.bulk.size: 12 http.max_content_length: 1024mb index.query.bool.max_clause_count: 1000000
This completes the installation.
Suppose there is a table in which the tags are:
CREATE TABLE public.tags ( id INTEGER PRIMARY KEY NOT NULL DEFAULT nextval('tags_id_seq'::regclass), word CHARACTER VARYING(100) NOT NULL );
Create index:
CREATE INDEX tags_zdb_search_idx ON tags USING zombodb (zdb('tags', tags.ctid), zdb(tags)) WITH (url='http://localhost:9200/');
As a result of this query, an index is created, and the data immediately goes into elasticsearch.
A query that Mom and Dad will find:
SELECT * FROM tags WHERE zdb('tags', ctid) ==> 'word:(,)';
Where word is the name of the field that will be searched. The search is implemented using the operator ==> .
Also, ZomboDb provides the text and fulltext domains based on the text type. Using your own domains, you can define mapping for elasticsearch.
Using queries, you can search for individual fields of the indexed table, as well as for all fields.
Queries support logical operations ( and , or , not ), brackets.
It is possible to use various search operators. For example, the query
SELECT * FROM texts WHERE zdb('texts', ctid) ==> 'text:';
where the operator is a colon, returns texts containing the word papa .
The operations more like this and fuzzy like this are also supported through the operators: @ and: @ ~, respectively.
Example:
SELECT * FROM texts WHERE zdb('texts', ctid) ==> '(text:@ and title:@) or text:';
Also, there is support for comparison operators:
SELECT * FROM texts WHERE zdb('texts', ctid) ==> 'comments > 10';
A detailed description of the query language in the documentation .
The project is a good product that works out of the box. Well documented, updated (the latest version of Postgres is supported, the latest commit at the time of this writing is 27 days old). If you show yourself well and consistently in production, I will write a wrapper for sqlalchemy.
Thanks for attention!
UPD I wrote a quick note of sqlalchemy extension
Source: https://habr.com/ru/post/303744/
All Articles