Prehistory
It took me to add to the site search function. The first thought was to take advantage of the capabilities of the SQL server, but it was necessary to look for several tables, words and phrases at once, and even with stemming. I realized that reinventing my bike would be expensive.
I decided to search, but what is there from ready-made solutions? It turned out, frankly, not a lot:
django-haystack and
django-sphinx . Earlier, the advantages and disadvantages of both have already been
listed , so I will not repeat.
Having spent some time reading blogs and forums, I decided to try django-sphinx, because in django-haystack, as far as I know, with the support of Sphinx it is still not very.
')
The author of django-sphinx abandoned his project long ago, but there are many forks, and they say that it is quite possible to use it. I chose the one that was, hmm, fresher and tried to connect it to my project.
Story
It turned out that everything is very bad there - a lot of errors, deficiencies, problems with the Python API Sphinx.
At first, I tried to just fix the errors in the code and make it work. I even managed to do it - I could search for one word (experts will rightly notice that SPH_MATCH_ANY would solve this problem), but I learned about this flag a little later. And I learned a lot more.
In the comments to the post that I referred to earlier, they scolded django-sphinx, which de does not know how, it does not support. I decided to add the missing features - as a result, a
fork was born. After some time, he already knew how to index MVA and fields from related models (the Sphinx documentation seemed confusing to me - I had to figure out for a long time what was happening). Many bugs have been fixed and no less added ... how else?
And then I decided to still read the section on SphinxQL. And almost completely rewrote django-sphinx.
At the moment, my fork can work with Sphinx as a disability in its SphinxQL dialect and boasts:
- support for sphinx 2.0.1-beta and above
- quite a lot of customization flexibility
- automatic generation of sphinx configuration
- the ability to search both in one index and in several
- the ability to index MVA and fields from related one-to-one models in one index
- support for creating snippets
- binding documents from the index to the objects of the corresponding models
- similar to Django ORM filtering search results (including the chain of methods)
RealTime-indexes are not supported yet, accordingly there are no functions for working with them (INSERT, UPDATE, DELETE).
Search by related models is not supported. And I'm not sure that it is needed at all. Commentators, who knows, give examples where and how can this be used?
A part of the code is already covered with tests (yes, I also learn to write unit tests along the way - I tried to start several times before, but I did not understand which side to approach this lesson in general)
In addition, I began to write documentation - while the outline, but in general, I hope everything is clear.
Well, I will give a few examples, which, in my opinion, may seem interesting.
I will take the following models as a basis:
class Related(models.Model): name = models.CharField(max_length=10) def __unicode__(self): return self.name class M2M(models.Model): name = models.CharField(max_length=10) def __unicode__(self): return self.name class Search(models.Model): name = models.CharField(max_length=10) text = models.TextField() stored_string = models.CharField(max_length=100) datetime = models.DateTimeField() date = models.DateField() bool = models.BooleanField() uint = models.IntegerField() float = models.FloatField(default=1.0) related = models.ForeignKey(Related) m2m = models.ManyToManyField(M2M) search = SphinxSearch( index='test_index', options={ 'included_fields': [ 'text', 'datetime', 'bool', 'uint', ], 'stored_attributes': [ 'stored_string', ], 'stored_fields': [ 'name', ], 'related_fields': [ 'related', ], 'mva_fields': [ 'm2m', ] }, )
First of all, on the basis of the
options dictionary, passed to the
SphinxSearch argument, a
config will be generated, in which:
- all fields from included_fields will be placed in the index, and non-string fields as stored attributes
- all the fields from stored_attributes , as you understand, will also be stored. This list can be useful if you need to make a stored text field.
- fields from stored_fields will become stored, but will also be available for full-text search.
- fields from related_fields , have you guessed it ?, the same will be declared as stored. Keys from related models will be stored there (just below I will explain why)
- Finally, the appointment of mva_fields , I think you already understand. Only the names of ManyToMany fields can be placed in this list.
What does all this give us? And it gives a fairly large search capabilities.
Get the QuerySet for our model. This can be done in two ways:
qs = Search.search.query('query')
or:
qs = SphinxQuerySet(model=Search).query('query')
Both methods will give a similar result, but in the second case, the parameters passed to SphinxSearch in the model description (with the exception of the field lists) will not be taken into account.
Now we can search for something:
qs1 = qs.filter(bool=True, uint__gt=100, float__range=(1.0, 15.4)).group_by('date').order_by('-pk').group_order_by('-datetime')
Let me explain what this query does:
- searches the Search model index for the word 'query'
- the output will include only results in which the bool field contains True, the uint field is greater than 100, and the contents of the float field are in the range from 1.0 to 15.4
- groups all results by date
- sorting them by document ID in the reverse order ('pk' is converted to 'id' automatically)
- inside each group sorts the results by the datetime field also in the reverse order
What else can you do?
For example, suppose that the variable
r is stored in the QuerySet with several Related objects, and in
m - with M2M (see the models above). Then you can do something like this:
qs2 = qs.filter(related__in=r, m2m__in=m)
That is, you do not need to prepare lists of identifiers yourself - django-sphinx will do it for you!
And finally, I will say that SphinxQuerySet behaves like an array.
Finally, to get stored-attribute values (if they are needed for some reason) or calculated expressions, you need to refer to the
sphinx attribute of the object obtained from the SphinxQuerySet.
Yes. A little bit about expressions.
Sphinx can calculate various formulas on the fly for each document (ranking works according to the same principle) and allows you to create your own:
qs4 = qs.fields(expr1='uint*(float+100)')
The result of the calculation can be found inside the
sphinx attribute of the received objects.
In addition, Sphinx allows you to sort the output not only by a specific field, but also by these expressions, so that such code is also possible:
qs4 = qs.fields(expr1='uint*(float+100)').order_by('expr1')
So what am I talking about?
I hope that the inhabitants of the Habr will give me useful tips (or throw poop if I deserve ...) and indicate where I would need to further develop django-sphinx.
Thank you all for your attention! I thought to write a small article, but it turned out ... what happened.