📜 ⬆️ ⬇️

An example of autocomplete implementation using PostgreSQL tsearch2 FTS engine

Introduction


Once upon a time I wrote a diary for myself to conduct affairs, take notes and record movements for various tasks. It was originally made on a bunch of PHP + Kohana 2 + PostgreSQL. Over time, I rewrote everything in Yii (the first and then only version). For full-text search, the tsearch2 engine built into PostgreSQL was used. For many years I used the system, developed it little by little and came to the conclusion that the volume of texts in it had accumulated a decent amount. The search has to be used quite often, and to improve its convenience, I decided to fasten an autocomplete from the JQuery UI package to it.

Implementation


For everything to be correct, the choice of hints must be based on the same index as the search itself. All my texts are stored in a separate table called "texts". Here is its structure:

Table "public.texts" Column | Type | Modifiers -------------+-----------------------------+---------------------------------------------------------- txt_id | bigint | not null default nextval(('gen_txt_id'::text)::regclass) user_id | integer | not null txt | text | not null fti_txt | tsvector | last_update | timestamp without time zone | default now() format | textformat | default 'wiki'::textformat Indexes: "texts_pkey" PRIMARY KEY, btree (txt_id) "texts_txt_id_key" UNIQUE CONSTRAINT, btree (txt_id) "fti_texts_idx" gist (fti_txt) "last_update_idx" btree (last_update) "texts_uid_idx" btree (user_id) 

To accomplish the task of forming a list of prompts for the current search string, the Action was written as a separate, connected action. Source protected / extensions / actions / SearchAutocompleteAction.php :

 <?php class SearchAutocompleteAction extends CAction { public $model; public $attribute; public $fts_field; public function run() { //   $_uid = Yii::app()->user->id; $_model = new $this->model; $_tableName = $_model->tableName(); //     ,      //         $_query_array = explode(' ', trim(Yii::app()->db->quoteValue($_GET['term']), " '\t\n\r\0\x0B")); $_word = array_pop($_query_array); $_preQuery = implode(' ', $_query_array); $_suggestions = array(); /* *   tsvector    .       *      ,     (  ). */ $_sub_sql = "SELECT $this->fts_field FROM $_tableName WHERE user_id=''$_uid''"; if (count($_query_array) > 0) $_sub_sql .= " AND $this->fts_field @@ to_tsquery(''russian'', ''$_preQuery'')"; /* *  ,   ,     . *   ts_stat  tsearch2.      ,   , *        .      ndoc,  *  ,   . */ $_sql = "SELECT word AS $this->attribute FROM ts_stat('$_sub_sql') WHERE word LIKE '$_word%' ORDER BY nentry DESC LIMIT 15;"; foreach(Yii::app()->db->createCommand($_sql)->query() as $_m) $_suggestions[] = count($_query_array) > 0 ? $_preQuery.' '.$_m[$this->attribute] : $_m[$this->attribute]; echo CJSON::encode($_suggestions); } } 

To analyze the action algorithm, I give an example of an SQL query using the search string “hi hub”, formed by the Action:
')
 SELECT word AS txt FROM ts_stat('SELECT fti_txt FROM texts WHERE user_id=''1'' AND fti_txt @@ to_tsquery(''russian'', '''')') WHERE word LIKE '%' ORDER BY nentry DESC LIMIT 15; 

The essence of the work of tsearch2 in general lies in the formation of a tsvector type record in addition to the text one, in our example this is the fti_txt field. It contains the words of the text indicating their positions and the number of their appearance in the text. The index (gin or gist) is built using this record and the search is performed later. To debug and monitor the status of an index in tsearch2, there is a ts_stat function. As a parameter, it takes the text of an SQL query that returns a set of fields of type tsvector. According to this set, statistics are built in the form of a list of words with an indication of the number of entries (nentry) and the number of documents (entries) where the word occurs (ndoc).

In my example, if the word in the search query is one, it searches for a similar one in all user entries. If there are several words in the query - the last word is removed from the query, the record set is limited to full-text search in the first part of the query (without the last word).

Connect to the project


This part is Yii 1 specific, there is no magic here. Given for the integrity of the notes. In total there will be two steps. The first step is to connect the Action to the controller, in my case DiaryController. To do this, add the following lines to its actions () method:

  public function actions() { return array( ... 'acsearch' => array( 'class' => 'application.extensions.actions.SearchAutocompleteAction', 'model' => 'Texts', 'attribute' => 'txt', 'fts_field' => 'fti_txt', ), ... ); } 

Now in the appropriate view, we replace the old search text field:

 <?php echo CHtml::textField('sh', $search->sh, array('size' => 60,'maxlength' => 255)); ?> 

on the jquery ui widget:

  <?php $this->widget('zii.widgets.jui.CJuiAutoComplete', array( 'attribute'=> 'sh', 'sourceUrl' => array('acsearch'), 'name' => 'sh', 'value' => $search->sh, 'options' => array( 'minLength' => '2', ), 'htmlOptions' => array( 'size' => 60, 'maxlength' => 255, ), )); ?> 

As a result, we get something similar to the picture:

image

disadvantages


The whole system has one major drawback - words in the tsvector type field are recorded after the stemming. Simply put, most of the words “cut off” endings for inclusion in the search for their various forms. Look at the picture above and note the word "formed." Thus, this solution is applicable in projects for personal / internal use. Without solving this problem, it is impossible for people to show this. Maybe someone has a decent solution, or at least a thought. Welcome to the comments.

Source: https://habr.com/ru/post/307858/


All Articles