📜 ⬆️ ⬇️

Search Postgres using ZomboDb and elasticsearch

At some point in the development of the project there was a question of searching for a large number of texts. Moreover, the texts have different lengths: from tweets to large articles. At first, the main search engine was the built-in Postgres _ts vector . To search for simple rules it was quite enough. The array of texts grew at a high speed, and the search rules became more complicated, so the built-in engine did not cover the requirements.


Yes, there is a sphinx, it has excellent integration with Postgres, but the goal was to find a solution to use elasticsearch with Postgres. Why? Elasticsearch showed good results in some project cases. And already there was a server with it for storing logs logs. There was also a desire to find such a tool that will completely take over the synchronization of data.


As a result, the project ZomboDb was found on the open spaces of the network, which just fit the requirements.


Project page on github .


Extension installation


This section is a retelling of the official instructions.


Currently supported package versions:


PackageVersions
Elasticsearch1.7.1+ (not 2.0)
Postgres9.3, 9.4, 9.5

My configuration: Postgres 9.4, elasticsearch 1.7.5


  1. From the page you need to download and install a package with a plug-in for Postgres ( deb or rpm )
  2. Add the following line to postgresql.conf :


    local_preload_libraries = 'zombodb.so'` 

  3. Restart the database and create the extension:


     psql db_name -c "CREATE EXTENSION zombodb;" 

  4. Next, from the same page you need to download the plugin for elasticsearch and install it:


     bin/plugin -i zombodb -u file:///path/to/zombodb-plugin-XXXzip 

  5. Add to elasticsearch.yml :


     threadpool.bulk.queue_size: 1024 threadpool.bulk.size: 12 http.max_content_length: 1024mb index.query.bool.max_clause_count: 1000000 

  6. Restart elasticsearch.

This completes the installation.


Creating a test index


Suppose there is a table in which the tags are:


 CREATE TABLE public.tags ( id INTEGER PRIMARY KEY NOT NULL DEFAULT nextval('tags_id_seq'::regclass), word CHARACTER VARYING(100) NOT NULL ); 

Create index:


 CREATE INDEX tags_zdb_search_idx ON tags USING zombodb (zdb('tags', tags.ctid), zdb(tags)) WITH (url='http://localhost:9200/'); 

As a result of this query, an index is created, and the data immediately goes into elasticsearch.
A query that Mom and Dad will find:


 SELECT * FROM tags WHERE zdb('tags', ctid) ==> 'word:(,)'; 

Where word is the name of the field that will be searched. The search is implemented using the operator ==> .


Also, ZomboDb provides the text and fulltext domains based on the text type. Using your own domains, you can define mapping for elasticsearch.


Query language


Using queries, you can search for individual fields of the indexed table, as well as for all fields.
Queries support logical operations ( and , or , not ), brackets.
It is possible to use various search operators. For example, the query


 SELECT * FROM texts WHERE zdb('texts', ctid) ==> 'text:'; 

where the operator is a colon, returns texts containing the word papa .
The operations more like this and fuzzy like this are also supported through the operators: @ and: @ ~, respectively.
Example:


 SELECT * FROM texts WHERE zdb('texts', ctid) ==> '(text:@ and title:@) or text:'; 

Also, there is support for comparison operators:


 SELECT * FROM texts WHERE zdb('texts', ctid) ==> 'comments > 10'; 

A detailed description of the query language in the documentation .


findings


The project is a good product that works out of the box. Well documented, updated (the latest version of Postgres is supported, the latest commit at the time of this writing is 27 days old). If you show yourself well and consistently in production, I will write a wrapper for sqlalchemy.


Thanks for attention!


UPD I wrote a quick note of sqlalchemy extension


')

Source: https://habr.com/ru/post/303744/


All Articles