Highlights $ in'y: MongoDB performance in ranges

Translation of this article is already on Habré, but it is terrible and contains false information.

Greetings, adventurers! Traveling around the MongoDB indexing area for at least some time, you may have learned the following rule: if your request contains sorting / order (orderby), add the field to be sorted to the end of the index that is used for the request.

In many cases, when a query contains equality (that is, the search for a specific value, for example, {“name”: “Charlie”}), this mantra is very useful.
')
Request

db.drivers.find({"country": {"$in": ["A", "G"]}}).sort({"carsOwned": 1})

Index

{"country": 1, "carsOwned": 1}

Such a combination will not be as effective as it may seem, despite the fact that the index conforms to the rule. There is a trap in this query that you can easily fall into by following generally accepted opinions.

Let's see what is wrong here and what to do in such cases. But for a start, despite the fact that this post is not about the basics of indexing, let's refresh the memory on this topic using the MongoDB documentation :

“Index in advance”
Indexes deserve special attention in the design process. Historically, the question of responsibility for the efficiency of data access rests with the database administrator. This triggers the creation of optimization layers after the design, which document-oriented base can avoid.

“Index often”
Indexed queries are much more efficient than usual queries, even with a small amount of data. If a non-indexed query lasts 10 seconds, the same query can be executed in 0 milliseconds using the correct index.

“Index completely"
Queries use indices from left to right. The index can only be used as long as the query uses all fields and does not skip any.

“Index sorting”
If your request contains a sort or orderby operator, add the field to be sorted to the index.

Teams
.explain() indicates whether the index was used (along with other useful information - approx. lane),
.ensureIndex() creates an index,
.getIndexes() or .getIndexKeys() lists the indexes for the collection.

Let's return to our problem. With this basic knowledge in mind, the conventional wisdom says that for the following query:

db.collection.find({“country”: “A”}).sort({“carsOwned”: 1})

You must create the following index:

db.collection.ensureIndex({"country": 1, "carsOwned": 1})

What if most queries use a range instead of an equality? As here:

db.collection.find({"country": {"$in": ["A", "G"]}}).sort({"carsOwned": 1})

Here we use $in , but the same works for $gt , $lt , etc.

If you notice that such queries are slow, and you remember the basics we talked about, then run .explain() and see which index was used. But you will also see {scanAndOrder : true} , which means that MongoDB has sorted it. The reason for the slow execution of the query is precisely this: scanAndOrder is an expensive operation, because it sorts the documents in memory. It should be avoided when working with large data arrays; it is slow and heavily loads the processor.

But forget about the reasons for the slow work of scanAndOrder; Why does MongoDB sort the results if we have already added this field to the index? The answer is simple: we have not done so.

Why? The answer lies in the structure of the index that we created. In our example, the document with {“country”: “A”} and the document with {“country”: “G”} sorted in the index by the key {“carsOwned”: 1} , but they are sorted independently of each other. They are not sorted together!

The diagram on the left illustrates the order of visiting documents in accordance with the index we created. When all documents matching the query are found, they should be sorted. The diagram on the right shows the work of the query with an alternative index: { “carsOwned”: 1, “country”: 1} . By moving the sort field closer to the beginning (left) of the index, we created a script in which MongoDB visits the documents in the order we need. From this effective trick follows a simple set of rules for indexing:

The order of the fields in the index should be:

1. First - the fields that are used in the request for exact values.
2. Then - sorted fields.
3. And at the end - the fields that are used to search in the ranges.

Are we making a compromise? Yes. The request will visit more documents than is technically necessary, because visits to the index part to be sorted will occur before the operator uses the ranges. Remember these simple rules, but keep in mind that the size of your data may lead to different results.

I hope this guide will be useful for you. Good luck, adventurers!

Yours sincerely,
Eric from MongoLab.

Source: https://habr.com/ru/post/171919/

All Articles

Highlights $ in'y: MongoDB performance in ranges

More articles: