📜 ⬆️ ⬇️

Django + Sphinx = django-sphinx (?)



When we prepared our latest article about Django-batteries for Habr, it turned out that we still have something to tell about django-sphinx and our story draws to a separate post. Actually, here it is, as promised.

To date, there are several good solutions for organizing searches in Django. Several are two: Haystack and django-sphinx . Haystack works with solr, whoosh and hapian backend engines and, alas, does not work with Sphinx for some abstract licensing reasons. django-sphinx, as you might guess, works with Sphinx and only. Haystack is a high-quality, well-documented and actively developed product, and we, no doubt, would use it if he supported Sphinx in any form. But this, alas, has not yet happened. And Sphinx is our everything, thanks to its speed, flexibility and, which is very important in our geographical latitudes, the ability to take into account the peculiarities of Russian morphology, which cannot be said about its closest competitors. "Big, but 5 ... or small, but 3?"
')


Since the quality of search results is still crucial, the question of choosing a search engine was not particularly. And since, apart from django-sphinx, there is nothing “jangosphinx” in nature anymore, the choice of battery was predetermined. So:

Good:

Poorly:


You can, of course, use the Python API included in the Sphinx delivery itself, which magic4x just offered us. There is, however, a third option - to write your own battery, with blackjack tests and documentation.

On the other hand, everything is not so bad. Django-sphinx is successfully used in many projects and, by and large, copes with the work. Let's look at one example from the real world.

There is a certain model for which we want to organize a search:

class Post(models.Model): ... title = models.CharField(_(u''), max_length=1000) teaser_text = models.TextField(_(u''), blank=True) text = models.TextField(_(u'')) ... #  django-sphinx search = SphinxSearch(weights={'title': 100, 'teaser_text': 80, 'text': 90}) ... 


One of the main reasons why we use django-sphinx, rather than the Sphinx API, like real boys, is the ability of django-sphinx to automatically generate for us a sphinx-config based on the data we specified in the model. For this there is a special management command generate_sphinx_config . Using it is simple:

 $ ./manage.py generate_sphinx_config --all > absolute_path_to_config_file.conf 


By the way, you can create your own set of templates by which the config will be formed. In these templates, you can specify the search mode , Russian stemming , etc., then the config will not have to be tweaked by hand. Conveniently.

Now we need to run the search engine daemon itself. This part of the django-sphinx settings is no longer relevant; the programs from the Sphinx box are used.

 $ sudo searchd --config absolute_path_to_our_config_file.conf 


When you first start, searchd swears that there are no indices and there is nothing to do. To create index tables, we are provided with the indexer program, which in the simplest version runs like this:

 $ sudo indexer --config absolute_path_to_our_config_file.conf --all --rotate 


That's all. Of course, you can write even simpler management teams for these simple actions that would create for each developer their own config and their own instance of sphinxd in the system. Personally, we did.

So how do you compile search queries? What can django-sphinx except forming a config?

For example, in some view you need to get a search query object. It's very easy to do this:

 ... user_query = self.request.GET['query'] #   result = Post.search.query(user_query) ... 


We get a pseudo-result query object with a search result and some useful methods and attributes. For example, Sphinx is able to independently create search results snippets, which can even be slightly customized.

 passages_opts = {'before_match': '<span style="background-color: yellow">', 'match': '</span>', 'chunk_separator': '...', 'around': 10, 'single_passage': True, 'exact_phrase': True, } result = result.set_options(passages=True, passages_opts=passages_opts) 


What makes this code is not difficult to guess and there is nothing unusual in it. However, if you need further filtering of the sample (which is almost certainly the case), here be dragons. Everything starts working in a completely unexpected way.

BAGOFICHA number 1
To use the exclude and filter methods, you need to pre-assemble the id’s of the filtered objects and pass them as an unpacked attribute dictionary (easier to show with an example):

 excluded_obj_id_list = [post.id for post in result if post.is_published] filtered_result = result.exclude(**{'@id__in': excluded_obj_id_list}) 


And the most sudden thing about all this is that the last operation will not work as expected from it. Honestly, it will not work at all, no exclud will happen.

BAGOFICHA number 2
Everything works as you would expect only within a single chain of methods.

 filtered_result = Post.search.query(user_query).exclude(**{'@id__in': excluded_obj_id_list}) 


And this, of course, generates not the most efficient and transparent code.

BAGOFICHA number 3
In the Sphinx there are various search modes. For example, we want to set the mode to 'SPH_MATCH_ANY' (matches any of the query words). If you do this in the model itself, everything works well.

 search = SphinxSearch(weights={'title': 100, 'teaser_text': 80, 'text': 90}, mode='SPH_MATCH_ANY') 


If you do this in logic, where we enable the generation of snippets and their settings, everything also works well ...

 result = Post.search\ .query(user_query)\ .exclude(**{'@id__in': excluded_obj_id_list})\ .set_options(passages=True, passages_opts=passages_opts, mode='SPH_MATCH_ANY) 


... but you won't see snippets. Therefore, specify the search mode only in models.

In templates, everything is rather trivial. So, for example, snippets are displayed:

 {% for post in search_results %} <div class="g-content"> <a href="{{ post.get_absolute_url }}" class="b-teaser__descr__snippet-link"> {{ post.sphinx.passages.text|safe }} </a> </div> {% endfor %} 


The mentioned “features” drank a lot of blood and I hope that this post will save some of you time and nerves.

And finally. In December 2011, the first Sphinx release of the past few years was version 2.0.3. django-sphinx works only with versions 0.9.7, 0.9.8 and 0.9.9.



1) Sphinx - sphinxsearch.com
2) Original django-sphinx - github.com/dcramer/django-sphinx
3) Our fork with some bug fixes - github.com/futurecolors/django-sphinx

Source: https://habr.com/ru/post/136261/


All Articles