How we accelerated the search on hh.ru

Some time ago, our search began to work faster. This is especially noticeable on complex engine queries that use a minimum of filters and high-frequency words , which requires building facets by results and sorting maximum document volumes. But requests of medium complexity , where a few documents were issued, began to be processed much faster. Why did it become necessary to accelerate something and how did we do it?

Search on hh.ru is:

400 requests per second;
26 gigabytes of index updated in realtime;
3-fold replication ratio (failover redundancy);
5x performance margin.

And all this beauty with a total system load of 15% on some queries worked inexcusably slowly. Due to the fact that the “active” resume index is significantly more than the rest, this was especially critical for employers .
The search for hh.ru is based on Lucene, over which we have written quite a lot of code over many years. Earlier, we solved particular optimization problems, in which we came to the understanding that search performance rests on Lucene performance. More precisely in how we use it.

It is known that something that cannot be accelerated "in the forehead" can often be parallelized. In Lucene, from version 3.1.0, it was possible to divide each request into several streams by the number of segments. But there was a number (and in 4.3 versions there is) a comment “TODO: should we make this threaded ...? the Collector could be sync'd? always use single thread.

And the collectors (the mechanism that receives “one by one” all found documents in the segment) are used everywhere in us: our facet and sorting code is based on them.
')
Optionally, we conducted an experiment in which a code tied to collectors was isolated, the data were broken up into a number of segments, and compared with a linear and parallel search. He confirmed the possibility of acceleration, so we planned the task and the work began to boil.

The general plan looked like this:

we add to collectors the ability to collect and merge results from several segments;
implement / enable parallel segment traversal;
we divide the index into N equal in size segments;
PROFIT.

Item 1. Collectors

It turned out that the first point of the plan was implemented as a result of refactoring in a task that did not have a direct relationship to parallelization. As a result, we got a mechanism that allows you to combine the search results tree at as many levels as you like (we now have four levels: indices by document type, index, replica, segments).

Item 2. Parallel Segment Traversal

I will focus on it in more detail.

The following IndexSearcher method is extremely important to us:

public void search(Weight weight, Filter filter, Collector collector) throws IOException { // TODO: should we make this // threaded...? the Collector could be sync'd? // always use single thread: for (int i = 0; i < subReaders.length; i++) { // search each subreader collector.setNextReader(subReaders[i], docBase + docStarts[i]); final Scorer scorer = (filter == null) ? weight.scorer(subReaders[i], !collector.acceptsDocsOutOfOrder(), true) : FilteredQuery.getFilteredScorer(subReaders[i], getSimilarity(), weight, weight, filter); if (scorer != null) { scorer.score(collector); } } }

Here, in a cycle by segment, the collector switches to the next collector.setNextReader (...) from the list, and then the scorer “feeds” the found documents to the collector. It is the switching to the segment that deprives us of all the charm of multithreading: with a parallel search, the collector will not know which segment this or that document belongs to. The solution turned out to be quite simple: to make a super collector that will create workers for each segment:

 public interface ParallelCollector { /** * creates per-segment collector */ Collector createPartialCollector(); }

With this approach, the modification of IndexSearcher came out simple:

we transfer our "collector";
loop by subReaders;

get and initialize the selected collector:

  Collector collector = parallelCollector.createPartialCollector(); collector.setNextReader(subReader, subreaderDocBase);

already in Runnable we execute the existing old code and submit it to the existing executor;
catch errors. In our case, we catch and return to the first received exception.

  public void search(final Weight weight, final Filter filter, final ParallelCollector parallelCollector) throws IOException { final CountDownLatch latch = new CountDownLatch(subReaders.length); final AtomicReference<Throwable> exceptionReference = new AtomicReference<Throwable>(); for (int i = 0; i < subReaders.length; i++) { final int subreaderDocBase = docStarts[i]; final IndexReader subReader = subReaders[i]; executor.submit(new Runnable() { @Override public void run() { try { Collector collector = parallelCollector.createPartialCollector(); collector.setNextReader(subReader, subreaderDocBase); Scorer scorer = (filter == null) ? weight.scorer(subReader, !collector.acceptsDocsOutOfOrder(), true) : FilteredQuery.getFilteredScorer(subReader, getSimilarity(), weight, weight, filter); if (scorer != null) { scorer.score(collector); } } catch (Throwable t) { exceptionReference.compareAndSet(null, t); throw new RuntimeException(t); } finally { latch.countDown(); } } }); } try { latch.await(); } catch (InterruptedException e) { throw new RuntimeException(e); } Throwable possibleException = exceptionReference.get(); if (possibleException != null) { if (possibleException instanceof RuntimeException) { throw (RuntimeException) possibleException; } else if (possibleException instanceof IOException) { throw (IOException) possibleException; } else { throw new RuntimeException(possibleException); } } }

Item 3. Breakdown of segments

In Lucene, by default it is assumed that the segment should be one. More precisely, this Lucina seeks. In fact, for each flush of data to disk, a new small segment is created, which further, according to MergePolicy, is automatically merged with other small segments into larger ones, and so on. With a running update of the index, the “tail” of small segments is always present.

But the developers are great: they gave the means to limit the maximum size of the segment, which we used - setMaxMergeMB + setMaxMergeMBForForcedMerge solved the problem with a bang.

A bonus to solving the 3rd item was getting rid of the index optimization mechanism. In Lucene, documents in the index are appended. If the document needs to be reindexed, the old one is marked as deleted, and the new version is added to the end of the index. As a result, many "holes" appear over time, the index swells, which is why performance is greatly reduced.

You can fight this with periodic mergeDeletes (previously expungeDeletes) and forceMerge (previously optimize), which copy “live” documents from the old (perhaps several) to the new segment. This operation is quite expensive in terms of disk I / O and memory consumption.

Small segments significantly reduce costs due to the fact that to update / delete documents you have to rewrite fewer neighboring documents.

Result

So, for almost a month of development, we received a parallel search and many small segments. What is the value of this?

Faster search. Now the result of 95% of search requests for vacancies is issued in 10 milliseconds, and in the summary - in 70 milliseconds. For comparison, a few months ago it was 30 and 270, respectively.
The ability to offer a patch in Lucene (just about, but I want to “brush the code”).
Getting rid of an additional index optimization mechanism.

Visual result

Interval - 2 weeks.
The red line was, the blue line has become, the Y axis is the response time.

50th quantile, medium complexity searches:

And the 95th quantile, difficult to search queries with the maximum number of results:

Source: https://habr.com/ru/post/203986/

All Articles