📜 ⬆️ ⬇️

Zend_Search_Lucene + PHPMorphy is easy

Once watched documentation on Zend_Search_Lucene. Everything is good, everything is clear. Take it and embed it in your site. Only now there is not a word about how to fasten a stemmer or morphological analyzer to this thing. In fact, it turned out that it was very easy to make friends with it, for example, with PHPMorphy.
Actually, how to do it - under the cut.
The note will be primarily useful to developers who have not yet faced the problem of full-text search on the site.
Here you will not find a manual for setting up Lucene or PHPMorphy - there is plenty of this information on the Internet.


So let's get started.
Before adding to the index, the text is divided into tokens. Zend_Search_Lucene_Analysis_Analyzer_ * classes are responsible for how this happens. The input of the analyzer is text, and the output is a list of tokens. A token is a word that is directly written to the index + its position in the document. At least I understand it that way. In addition to the analyzer, there are filters that convert words to, say, lower case, or do not miss words shorter than three letters.
All we need to do is write a filter that will convert the word to some initial form. This form will remain in the index. I forgot to say. All queries to the index also undergo the same tokenization and filtering procedure. Thus, the search will be carried out on the initial forms of words, which we, in fact, need. Below is the code:

class My_PHPMorphy_TokenFilter extends Zend_Search_Lucene_Analysis_TokenFilter { public function normalize(Zend_Search_Lucene_Analysis_Token $srcToken) { //   Zend_Search_Lucene_Analysis_TokenFilter_LowerCaseUtf8 //      } } $analyzer = new Zend_Search_Lucene_Analysis_Analyzer_Common_Utf8(); $analyzer->addFilter(new My_PHPMorphy_TokenFilter()); Zend_Search_Lucene_Analysis_Analyzer::setDefault($analyzer); 

')
Everything. We index what we need and display the search results to the user, as taught in the Zend_Framework manual.

Source: https://habr.com/ru/post/90594/


All Articles