📜 ⬆️ ⬇️

What's New in PostgreSQL 11: Embedded Web Search



Continuing the topic of interesting features of the upcoming release of PostgreSQL 11, I would like to tell you about the new built-in function websearch_to_tsquery . The corresponding patch was developed by Viktor Drobny and Dmitry Ivanov, with corrections from Fyodor Sigaev. Let's see what is implemented in this patch.

It would seem that full-text search has been in PostgreSQL for a long time, and it works very well. What else could I add?
')
Imagine that you are doing an online store based on PostgreSQL and you need a search by product. Here you have a form with a search query. To search the database from this query, you need to somehow build a tsvector. You can do this using the to_tsquery function. But to_tsquery expects the string to be in a specific format:

=# select to_tsquery('foo bar baz');
ERROR: syntax error in tsquery: "foo bar baz"

=# select to_tsquery('foo & bar & baz');
to_tsquery
-----------------------
'foo' & 'bar' & 'baz'

In other words, in this case, you will have to write a function that converts the user's request into a request that is understandable to_tsquery. Inconvenient. In part, this problem is solved by the plainto_tsquery and phraseto_tsquery functions:

=# select plainto_tsquery('foo bar baz');
plainto_tsquery
-----------------------
'foo' & 'bar' & 'baz'

=# select phraseto_tsquery('foo bar baz');
phraseto_tsquery
---------------------------
'foo' <-> 'bar' <-> 'baz'

But there is a problem with them. The fact is that the user can intuitively use quotes or, say, some boolean operators, because it works for Google, Yandex and other search engines. Let's see what happens in this case:

=# select plainto_tsquery('"foo bar" -baz or qux');
plainto_tsquery
-------------------------------
'foo' & 'bar' & 'baz' & 'qux'

It all broke! Oh. Do you really have to write your parser?

So that it does not have to be written from scratch for each application, starting with PostgreSQL 11, the corresponding parser will now be directly in the DBMS:

=# select websearch_to_tsquery('"foo bar" -baz or qux');
websearch_to_tsquery
----------------------------------
'foo' <-> 'bar' & !'baz' | 'qux'

Besides the fact that websearch_to_tsquery understands quotation marks, the minus sign and boolean operators, it is interesting because it ignores any attempts to make a syntax error. That is, you will never get an error, there will always be some tsquery output:

=# select websearch_to_tsquery('-"foo bar" ((( baz or or qux !@#$%^&*_+-=');
websearch_to_tsquery
--------------------------------------
!( 'foo' <-> 'bar' ) & 'baz' | 'qux'

Another feature of the function is the fact that it ignores any brackets. That is, this is not how to ride:

=# select websearch_to_tsquery('foo and (bar or baz)');
websearch_to_tsquery
-----------------------
'foo' & 'bar' | 'baz'

This behavior was chosen from considerations that normal people (not IT specialists: trollface :) in practice do not use brackets anyway. Just ignoring them, we significantly simplify the implementation of the feature “to make the function never end with an error”, and the parsing of the request will work faster. Perhaps a flag including support for parentheses will appear in future versions.

Such is the simple but useful function. At a minimum, it does full-text search in PostgreSQL no worse than it was before, and allows developers to simplify the code of their applications.

Source: https://habr.com/ru/post/353848/


All Articles