📜 ⬆️ ⬇️

How to make friends Django and Sphinx?

Prehistory


It took me to add to the site search function. The first thought was to take advantage of the capabilities of the SQL server, but it was necessary to look for several tables, words and phrases at once, and even with stemming. I realized that reinventing my bike would be expensive.

I decided to search, but what is there from ready-made solutions? It turned out, frankly, not a lot: django-haystack and django-sphinx . Earlier, the advantages and disadvantages of both have already been listed , so I will not repeat.

Having spent some time reading blogs and forums, I decided to try django-sphinx, because in django-haystack, as far as I know, with the support of Sphinx it is still not very.
')
The author of django-sphinx abandoned his project long ago, but there are many forks, and they say that it is quite possible to use it. I chose the one that was, hmm, fresher and tried to connect it to my project.

Story


It turned out that everything is very bad there - a lot of errors, deficiencies, problems with the Python API Sphinx.
At first, I tried to just fix the errors in the code and make it work. I even managed to do it - I could search for one word (experts will rightly notice that SPH_MATCH_ANY would solve this problem), but I learned about this flag a little later. And I learned a lot more.

In the comments to the post that I referred to earlier, they scolded django-sphinx, which de does not know how, it does not support. I decided to add the missing features - as a result, a fork was born. After some time, he already knew how to index MVA and fields from related models (the Sphinx documentation seemed confusing to me - I had to figure out for a long time what was happening). Many bugs have been fixed and no less added ... how else?

And then I decided to still read the section on SphinxQL. And almost completely rewrote django-sphinx.

At the moment, my fork can work with Sphinx as a disability in its SphinxQL dialect and boasts:



RealTime-indexes are not supported yet, accordingly there are no functions for working with them (INSERT, UPDATE, DELETE).
Search by related models is not supported. And I'm not sure that it is needed at all. Commentators, who knows, give examples where and how can this be used?

A part of the code is already covered with tests (yes, I also learn to write unit tests along the way - I tried to start several times before, but I did not understand which side to approach this lesson in general)

In addition, I began to write documentation - while the outline, but in general, I hope everything is clear.

Well, I will give a few examples, which, in my opinion, may seem interesting.

I will take the following models as a basis:

class Related(models.Model): name = models.CharField(max_length=10) def __unicode__(self): return self.name class M2M(models.Model): name = models.CharField(max_length=10) def __unicode__(self): return self.name class Search(models.Model): name = models.CharField(max_length=10) text = models.TextField() stored_string = models.CharField(max_length=100) datetime = models.DateTimeField() date = models.DateField() bool = models.BooleanField() uint = models.IntegerField() float = models.FloatField(default=1.0) related = models.ForeignKey(Related) m2m = models.ManyToManyField(M2M) search = SphinxSearch( index='test_index', options={ 'included_fields': [ 'text', 'datetime', 'bool', 'uint', ], 'stored_attributes': [ 'stored_string', ], 'stored_fields': [ 'name', ], 'related_fields': [ 'related', ], 'mva_fields': [ 'm2m', ] }, ) 


First of all, on the basis of the options dictionary, passed to the SphinxSearch argument, a config will be generated, in which:



What does all this give us? And it gives a fairly large search capabilities.

Get the QuerySet for our model. This can be done in two ways:

  qs = Search.search.query('query') 


or:

  qs = SphinxQuerySet(model=Search).query('query') 


Both methods will give a similar result, but in the second case, the parameters passed to SphinxSearch in the model description (with the exception of the field lists) will not be taken into account.

Now we can search for something:

  qs1 = qs.filter(bool=True, uint__gt=100, float__range=(1.0, 15.4)).group_by('date').order_by('-pk').group_order_by('-datetime') 


Let me explain what this query does:


What else can you do?

For example, suppose that the variable r is stored in the QuerySet with several Related objects, and in m - with M2M (see the models above). Then you can do something like this:

  qs2 = qs.filter(related__in=r, m2m__in=m) #  qs3 = qs.filter(related=r[0]) 


That is, you do not need to prepare lists of identifiers yourself - django-sphinx will do it for you!

And finally, I will say that SphinxQuerySet behaves like an array.

  #       doc = qs[5] #   docs = qs[3:20] docs = qs[:50] docs = qs[100:] 


Finally, to get stored-attribute values ​​(if they are needed for some reason) or calculated expressions, you need to refer to the sphinx attribute of the object obtained from the SphinxQuerySet.

Yes. A little bit about expressions.
Sphinx can calculate various formulas on the fly for each document (ranking works according to the same principle) and allows you to create your own:

  qs4 = qs.fields(expr1='uint*(float+100)') 


The result of the calculation can be found inside the sphinx attribute of the received objects.
In addition, Sphinx allows you to sort the output not only by a specific field, but also by these expressions, so that such code is also possible:

  qs4 = qs.fields(expr1='uint*(float+100)').order_by('expr1') 


So what am I talking about?



I hope that the inhabitants of the Habr will give me useful tips (or throw poop if I deserve ...) and indicate where I would need to further develop django-sphinx.

Thank you all for your attention! I thought to write a small article, but it turned out ... what happened.

Source: https://habr.com/ru/post/164869/


All Articles