📜 ⬆️ ⬇️

Implementation of morphological search on Kohana (phpMorphy library)

Good day, Habrasoobschestvo!
Recently I set out to make a search on my site, written in the Kohana Framework. I decided to use exactly the morphological search, because I think it is more correct (regarding full-text search using LIKE). Searches for ready-made modules for Kohana with the required functionality did not succeed, but I found an excellent library: phpMorphy , which was great for solving my problems.

The search operation logic includes 2 blocks:

Content Indexing
On the site we have the following structure:

As can be seen from the attached scheme, there are 2 types of content on the site:

We are going to index all of this content, and due to the fact that comments can appear throughout the life of the content - the need to re-index the content on an ongoing basis. From the point of view of logic, content indexing is as follows: Alternately, we get all the content, we search for content-related additional materials (comments, ingredients, cooking steps). Further we carry out such operations:


Search index search
After all the content is indexed, the simplest thing remains - search by the created search index. To do this, we act in a similar way:

Once the logical component of the question has become clear - you can proceed to the analysis of the code.

Write the code

Start
To get started, go to the Sourceforge project page and download the current version of the library, as well as the dictionary database (since Kohana works with utf-8 - download the dictionaries for this encoding).
The developers recommend placing the library files so that they are not directly accessible from the web. I didn’t specify for what reasons, so I suggest not abusing the recommendation and uploading files either above the / www directory or (if you upload to any directory inside / www) to prohibit direct access to the folder from the web. This can be done by placing the .htaccess file in a folder:
Options -Indexes <Files ~ "\.(php|php3|php4|php5|pl|cgi|sh|bash)$"> Deny from all </Files> 

')
Library initialization
To use the library functionality, you need to include the necessary files and create an instance of the class with which further actions will be taken:
 require_once('{    }/src/common.php'); $dir = '{  ,     }/dicts'; $lang = 'ru_RU'; $opts = array( 'storage' => PHPMORPHY_STORAGE_FILE, ); try { $morphy = new phpMorphy($dir, $lang, $opts); } catch(phpMorphy_Exception $e) { die('Error occured while creating phpMorphy instance: ' . $e->getMessage()); } 


Integrating the library in Kohana
In the proposed solution, I use 2 controllers:

In addition, for convenience (I personally use ORM), you need to create a model:
 class Model_Searchindex extends ORM { protected $_table_name = 'searchindex'; } 

Well and, accordingly, the 'searchindex' table, consisting of fields:

The table must be of the MyISAM type.
Let's talk more about each of the controllers.

Indexing Controller
In my case, this controller is used as a handler, to which I address asynchronous requests from the control panel of the site (of course, the controller is accessible only to a user with administrator rights).
We configure the possibility of obtaining an additional parameter in routs (since the operation is difficult and I would like to break it into portions):
 Route::set('index', 'updateindex(/<offset>)') ->defaults(array( 'directory' => 'admin', 'controller' => 'updateindex', 'action' => 'index', )); 

Well, with routs, I think everyone understood what was meant. Further, in the controller itself, in action_index (), we take the offset parameter, create an instance of the phpMorphy class, and perform all the operations described in the logic diagram:
  $offset = $this->request->param('offset'); //       if ($offset == 1) { $index = DB::query(Database::DELETE, 'DELETE FROM `searchindex`'); $index->execute(); } $data = array(); //     $posts = ORM::factory('post')->where('delete', '=', 0)->offset(100*$offset)->limit(100)->find_all(); foreach ($posts as $post) { $words = array(); //   html,          $title = mb_strtoupper(str_ireplace("", "", strip_tags($post->title)), "UTF-8"); $comments = ORM::factory('comment')->where('post_id', '=', $post->id)->order_by('id', 'ASC')->find_all(); //  ,    $text = $post->text; if ($post->type == 1) { //    ,      . ,     ... } foreach ($comments as $comment) { //    ,         $text = $text.' '.$comment->text; } $text = mb_strtoupper (str_ireplace("", "", strip_tags($text)), "UTF-8"); preg_match_all ('/([a-z-]+)/ui', $title, $word_title); //     preg_match_all ('/([a-z-]+)/ui', $text, $word_text); //    ,   =>  $start_form_title = $morphy->lemmatize($word_title[1]); $start_form_text = $morphy->lemmatize($word_text[1]); foreach ($start_form_title as $k=>$w) { if (!$w) { //       ,    $w[0] = $k; } if (mb_strlen($w[0], "UTF-8") > 2) //   ,     { if (! isset ( $words[$w[0]]))$words[$w[0]] = 0; $words[$w[0]]+= 3; //     } } foreach ($start_form_text as $k=>$w) { //     } //          foreach ($words as $word=>$weight) { $data['post_id'] = $post->id; $data['word'] = $word; $data['weight'] = $weight; $addindex = ORM::factory('searchindex'); $addindex->values($data); try { $addindex->save(); } catch (ORM_Validation_Exception $e) { $errors = $e->errors('validation'); } } } /*      json,       ,      */ $pcount = ORM::factory('post')->where('delete', '=', 0)->count_all(); if (($pcount - (100*$offset)) > 0) { $complateu = ($offset) * 100; $percent = ($complateu / $pcount) * 100; $percent = round($percent, 0); $json = array('status'=>'next', 'nextid'=>1+$offset, 'percent'=>$percent); $this->response->body(json_encode($json)); } else { $json = array('status'=>'finish', 'percent'=>100); $this->response->body(json_encode($json)); } 

I think that it is not necessary to give the implementation code of the control panel (considering that even now the volume of the article is not small). There everything is quite banal - a button, and a jquery handler, accessing the controller described above and processing the resulting response accordingly.

The controller responsible for searching the site
For the operation of this controller in the same way we create a route. The controller accepts the search phrase entered by the user. The phrase is transmitted using the GET method. This is what the controller looks like:
  public function action_search() { $data = null; $request = null; $errors = null; if (!empty($_GET['text'])) //    { //   html-   $search = $this->_clear_var($_GET['text']); $request = $search; } /*   phpMorphy */ if (!empty($search)) { //        if (mb_strlen($search, "UTF-8") > 2) { preg_match_all('/([a-z-]+)/ui', mb_strtoupper($search, "UTF-8"), $search_words); $words = $morphy->lemmatize($search_words[1]); $s_words = array(); $pre_result = array(); foreach ($words as $k => $w) { if (!$w)$w[0] = $k; if (mb_strlen($w[0], "UTF-8") > 2) { $s_words[] = $w[0]; } } if (!count($s_words)) { //   (     2 ) } else { foreach($s_words as $s_word) { $search_index = ORM::factory('searchindex')->where('word', '=', $s_word)->find_all(); foreach ($search_index as $si) { if (!empty($pre_result[$si->post_id])) { $pre_result[$si->post_id] = (int) $si->weight + $pre_result[$si->post_id]; } else { $pre_result[$si->post_id] = (int) $si->weight; } } } arsort($pre_result); //      foreach ($pre_result as $id => $weight) { // , ,         $data[] = $result; } } } else { //   -     } } else { //   -    } $this->template->content = View::factory('content/v_search') ->bind('data', $data) ->bind('errors', $errors) ->bind('request', $request) } 

Well, I think there’s no point in describing the big information - everything is as usual. I tried to reduce the code listings as much as possible so as not to clutter up the article with very simple and trivial things (such as getting information from the database or handling errors, everyone already knows how to do it ...).
Hope my article will be helpful An example of how this search implementation works can be found here .

Source: https://habr.com/ru/post/165715/


All Articles