📜 ⬆️ ⬇️

But about Sphinx 3.0

Here you all sit there and do not know anything, and we, meanwhile, are sawing little by little the mega-release of the search engine Sphinx number 3.0. Coming a series of large alterations. Some of them, as expected, have not even begun properly. However, most are more likely ready than not. And separately taken changes even leaked to the public branch 2.3 . So, perhaps, it is time to start briefly telling what to expect in a bright future: I hope, not so remote. Who is interested in reading, all under the cat; To listen, come to meetup this Saturday . In brief, then: farewell, the concept of the engine complementing the main base; hello, document repository, total RT, replication, REST and a number of other well-known keywords.

Let's start with the main, short list of planned large changes:


Two large internal changes are, of course, from all sides (both in the full-text and attribute parts) new index format, plus a forced transition to a strictly thread model of work and strictly RT indices. Why and why is it all?

The old format of the FT index has been conceptually stretched, it’s scary to think, since 2001. Not, of course, certain changes and optimizations were made regularly in it, so the old man is still hearty. However, there were no radical, conceptual changes, naturally, right from 2001. How to store a pack of varint encoded deltas external docid, and store. Moreover, being a rather low-level building block, the index format affects not only the actual storage of data, but also affects a number of pieces in the internal architecture, which then flow outwards: the requirement to provide external DocID, headaches with duplicate documents, a number of strange moments in performance, this is it.
')
Today we can do better.

In the new format there is no “magic” external unique attribute DocID, the internal numbering of documents is untied from the external one. This change alone makes possible a bunch of pieces: suddenly, duplicate IDs can be poured into the index, or no numeric ID is required at all; suddenly, if necessary, you can build compact bitmasks for a subset of index documents, rather than hefty ID lists; suddenly, the actual full-text index data may be much more efficiently compressed with all sorts of group codecs such as PFD, Group Varint and others; suddenly, when indexing, the boring sorting stage is not needed, the index can be built almost incrementally; and so on. The index becomes up to 1.5 times smaller, is built 2-3 times faster and, theoretically, it can also be searched up to 2-3 times faster, at least on individual queries. Profit from all sides.

The latter provokes the next minor revolution. Since the new incremental indexing algorithm (which, roughly speaking, “appends” 1 entry to the existing non-index in memory), was just 2-3 times faster than the current package indexing of disk indexes, then why bother to distinguish between disk indexes and RT indexes? Great, down with disk indexes, down with type directive. Since now we can make all indexes RT, and without loss of performance, rather even with acceleration, it means that we need to do it. (Inside, all the same, there are still disk / RAM based options for implementing individual segments of the index, of course. The disk and memory do not differ in TTX and this cannot be remembered. However, this is a clear headache "how to do it effectively" for developers, and noticeable outside functional differences for users should not be.)

What is interesting, while the indexer utility, in principle, does not disappear anywhere. ETL tool for uploading data from the database and loading them into the search is still useful to have. However, it used to build a disk index, and now it will build an RT index (either for direct downloading to the daemon, or for sticking a freshly indexed data packet to an existing index, this is no longer important).

Since the format of the full-text part is still changing a lot, then at the same time we need to change the attribute storage format, walk for a walk (break compatibility, break) !!! Now all columns of variable length (strings, MVA, JSON, and any other tricky types, if they are added in the future) for one document are stored in one large blob, and the pointer to this blob is also stored alone. (Previously, each type had a separate file, and a separate column, ohh.) At the same time, the stupid youth mistake “4 GB should be enough for everybody” is eliminated, plus everything can be uniformly updated, deleted, and so on. Total fixed-length attributes are stored as before (only due to the transition to internal numbers, access to them is much faster), variable-length attributes eat somewhat less memory (8 bytes for all at once, and not 4 bytes for each), plus now there’s no way limited in size. Plus other nice improvements: for example, the format now supports honest NULL. Or the key compression is now done inside the JSON data.

Switching to RT, in turn, means that fork / prefork is no longer just hard to maintain, but impossible. However, here once and for all there is a solution “both quickly and well”, without these damned compromises: thread pool (already available in 2.3) works as energetically as possible in terms of search speed (see quickly), the basic model of parallel processing remains one, multi-threaded, not multiprocess (see good). If even kreshy managed to affect only one thread, it would be generally ideal, but there is no ideal in life. Therefore, to alleviate the pain of kresh, now the overwhelming part of the necessary data at the start is still mmap (), rather than being copied, due to which the launch (and restart), even with large indices, became quite fast.

In addition to threads, due to forcing RT, replication becomes very, very necessary. Once we declare the rejection of disk indexes, which at least somehow could be copied in the form of files back and forth, it means that at least some tools are needed for RT. You can, of course, make a hot backup / restore, but this is boring. Online replication of RT indexes is much more interesting. Well, we do. 1 master, N dynamic replicas, automatic transfer of initial snapshot, replication of incoming changes, re-selection of the master, all these things. The next step is all sorts of tools for building and managing the cluster, but first let replication heal completely.

Replicating "just full-text indexes" is somehow small, plus giving the original documents to save, because they are regularly asked, plus to further speed up the snippets, you need to be able to save the blob with any strange stuffing, plus the format of the index untethered from the docid allows for any cleverly implemented. Well, we do more and docstore, those. disk storage of not only attributes, but also fields of indexed documents, as well as any associated meta information (for now, these are only document level indices for snippets). A small step for the code, great for functionality: now you can add not only the full-text index to the Sphinx, but you can also completely use it as a base. Weird, but the base.

It would be nice to have indexes in the database not only for keywords, so at the same time you need to worry about indexes by attributes. Well, this is so that the WHERE MATCH ('the who') AND author = 1234 is executed “from the index”, the WHERE MATCH ('biggest rarity') AND gender = 'm' is executed from the “search”, but both are automatically it was executed optimally, but not in such a way that in the calling PHP script it is necessary to implement an analogue SQL optimizer and a query generator of the WHERE MATCH type ('the who _author1234'). It is stupid to emulate attributes with a cloud of keywords, as it seems like (it seems) is still known where, too, of course, you can, but if you already do, so for the most !!!

And finally, SQL is unfashionable, plus it is not convenient to call anywhere, so HTTP / REST access must be added. A very basic initial implementation is already in 2.3, from there we will expand and deepen. Nothing interesting, even boring.

With all this, Sphinx 3.0 will try to fly.

The series described, of course, is still under development and may not be included in the initial alphas. However, the general plan is just that, plus, let's say, more than half of this plan has already been done. Work left pretty, but turn back is too late!

It is clear that for each individual big item you can write a separate good article with a bunch of technical details, but you need to start somewhere. Here, I tried to make a small review of future changes. I will tell the same thing in more detail in 3 days, this Saturday at the meeting of Sphinx-gadflies (Moscow, of course), who immediately had a million clarifying questions and there is no way to write them all in the comments, go to the light, try to answer . As far as possible I will write here (and / or to the blog on the site) more articles about the coming mega-fiches.

In general, Sphinx 3.0 is coming, a lot of rework is coming, I think it will be interesting.

After all, there cannot be exactly ONE library in the world for full-text search, and that one on Godless Java? !!! :-)

Source: https://habr.com/ru/post/257789/


All Articles