📜 ⬆️ ⬇️

New aggregation framework in MongoDB 2.1

In release 2.1 , implementation of such functionality as a new data aggregation framework was announced . I would like to tell you about the first impressions of this very interesting thing. This functionality should allow in some places to abandon Map / Reduce and writing JavaScript code in favor of fairly simple constructions designed to group the fields almost like in SQL.



Documentation on innovations is located in the appropriate section of the official site. First, let's look at how this works and what MongoDB constructs will help us.
')
So, the main difficulty in fetching data from MongoDB is working with arrays and data contained within some individual elements. Yes, we can select them as in SQL, but we cannot aggregate by them directly during the selection. The new framework is a declarative way to work with such data, based on a chain of special operators (there are only 7 of them). The sample data is transferred from the output of one operator to the input of another, just like in unix. Partly with the help of new operators you can repeat existing ones. Let the test collection be a collection for storing data about people. Standard sample:

db.test.find({name: "Ivan"}); 

will be similar

 db.test.aggregate({$match: {name: "Ivan"}}); 

But everything is a little more interesting, because in the second example we can build a data processing chain, listing the operators separated by commas. The sorting operator is $ sort, for example:

 db.test.aggregate({$match: {name: "Ivan"}}, {$sort: {age: 1}}); 

So we all those people with the name “Ivan” and sort the sample by age. And in order to choose the eldest Ivan, we need to cut the sample with one element:

 db.test.aggregate({$match: {name: "Ivan"}}, {$sort: {age: -1}}, {$limit: 1}); 

You will say that this is a repetition of the existing functionality. To some extent, yes, but we have not considered the new operators, allowing more flexibility to work with samples. Let us examine them in more detail.

$ Project operator


It is intended for manipulating fields, it can add new ones, delete them and rename them in the documents received by him at the entrance. The following structure will include in the document flow (filter) only the names and age of users:

 {$project: {name: 1, age: 1}} 

All documents with only two fields will get to the input of the next operator, there will be no other fields in the stream (with the exception of the _id field, in order to exclude it, you must specifically specify _id: 0). The number 1 includes, the number 0 excludes the transfer of the field. In addition, this operator allows you to rename fields, “get” fields from an embedded object of a field, or add new fields based on some calculations .

$ Unwind operator


In my opinion this is the most interesting operator. It allows you to "expand" the nested arrays on each element of the sample documents. For example, let us have the following base of people:

 db.test.insert({name: "Ivan", likes: ["Maria", "Anna"]}); db.test.insert({name: "Serge", likes: ["Anna"]}); 

Let field likes means which girls like which boy. Apply the $ unwind operator:

 db.test.aggregate({$unwind: "$likes"}); 

 { "result" : [ { "_id" : ObjectId("4f598de76a8f8bc74573e9fd"), "name" : "Ivan", "likes" : "Maria" }, { "_id" : ObjectId("4f598de76a8f8bc74573e9fd"), "name" : "Ivan", "likes" : "Anna" }, { "_id" : ObjectId("4f598e086a8f8bc74573e9fe"), "name" : "Serge", "likes" : "Anna" } ], "ok" : 1 } 

We see that the likes array has expanded and every document now has a likes field with every value of the array it had before. If we want to find the most popular girl, just group the sample by the likes field. For grouping is the next statement.

$ Group operator


For convenience, we will supplement the sample with another field filled with the number 1 (it will be easier to summarize this way):

 db.test.aggregate({$unwind: "$likes"}, {$project: {name:1, likes:1, count: {$add: [1]}}}); 

 { "result" : [ { "_id" : ObjectId("4f598de76a8f8bc74573e9fd"), "name" : "Ivan", "likes" : "Maria", "count" : 1 }, { "_id" : ObjectId("4f598de76a8f8bc74573e9fd"), "name" : "Ivan", "likes" : "Anna", "count" : 1 }, { "_id" : ObjectId("4f598e086a8f8bc74573e9fe"), "name" : "Serge", "likes" : "Anna", "count" : 1 } ], "ok" : 1 } 

This will allow us to use the aggregation operator $ sum. That is, now we simply add the value of the count field to the number field each time and group the whole selection by the likes field containing the name of the girl.

 db.test.aggregate({$unwind: "$likes"}, {$project: {name:1, likes:1, count: {$add: [1]}}}, {$group: {_id: "$likes", number: {$sum: "$count"}}}); 

 { "result" : [ { "_id" : "Anna", "number" : 2 }, { "_id" : "Maria", "number" : 1 } ], "ok" : 1 } 

It remains to sort and limit the output to only one document:

 db.test.aggregate({$unwind: "$likes"}, {$project: {name:1, likes:1, count: {$add: [1]}}}, {$group: {_id: "$likes", number: {$sum: "$count"}}}, {$sort: {number: -1}}, {$limit: 1}); 


 { "result" : [ { "_id" : "Anna", "number" : 2 } ], "ok" : 1 } 


Our most popular girl is Anna.

And now a concrete example.



In order to penetrate purely concretely with new possibilities, suppose that we have a collection that stores data about animals in a zoo and solve several problems of data aggregation. Here are our paws and tails:

 db.zoo.insert({name: "Lion", ration: [{meat: 20}, {fish: 1}, {water: 30}], holidays: [1,4], staff: {like: ["Petrovich", "Mihalich"], dislike: "Maria"}}); db.zoo.insert({name: "Tiger", ration: [{meat: 15}, {water: 25}], holidays: [6], staff: {like: ["Petrovich", "Maria"]}}); db.zoo.insert({name: "Monkey", ration: [{banana: 15}, {water: 10}, {nuts: 1}], holidays: [2], staff: {like: ["Anna"], dislike: "Petrovich"}}); db.zoo.insert({name: "Panda", ration: [{bamboo: 15}, {dumplings: 50}, {water: 3}], staff: {like: ["Petrovich", "Mihalich", "Maria", "Anna"]}}); 

The name field stores the name, the ration field is an array of objects that store how much and what kind of food the beast needs daily, holidays are the days on which the beast rests and is not shown to visitors, staff.like - caretakers that he likes (pandas, enchantments, like everything else ), staff.dislike - do not like.

Let's start with a simple sample - only the names of the animals, so that the director of the zoo would not forget who their names are. Everything is simple:

 db.zoo.aggregate({$project: {name: 1}}); 

 { "result" : [ { "_id" : ObjectId("4f58b7f627f86b11258dc70c"), "name" : "Lion" }, { "_id" : ObjectId("4f58b86027f86b11258dc70d"), "name" : "Tiger" }, { "_id" : ObjectId("4f58b90c27f86b11258dc70e"), "name" : "Monkey" }, { "_id" : ObjectId("4f58b98727f86b11258dc70f"), "name" : "Panda" } ], "ok" : 1 } 


What kind of animals do you need boatstsa?


You have to be afraid of predators. A predator is one who has meat in the diet. Let's find them. To begin with, we will filter the stream and select only two fields in the documents - the name and the diet.

 db.zoo.aggregate({$project: {name: 1, _id: 0, ration: 1}}); 

 { "result" : [ { "name" : "Lion", "ration" : [ { "meat" : 20 }, { "fish" : 1 }, { "water" : 30 } ] }, { "name" : "Tiger", "ration" : [ { "meat" : 15 }, { "water" : 25 } ] }, { "name" : "Monkey", "ration" : [ { "banana" : 15 }, { "water" : 10 }, { "nuts" : 1 } ] }, { "name" : "Panda", "ration" : [ { "bamboo" : 15 }, { "dumplings" : 50 }, { "water" : 3 } ] } ], "ok" : 1 } 


Then we expand the ration array on the elements of the main array:

 db.zoo.aggregate({$project: {name: 1, _id: 0, ration: 1}}, {$unwind: "$ration"}); 

 { "result" : [ { "name" : "Lion", "ration" : { "meat" : 20 } }, { "name" : "Lion", "ration" : { "fish" : 1 } }, { "name" : "Lion", "ration" : { "water" : 30 } }, { "name" : "Tiger", "ration" : { "meat" : 15 } }, { "name" : "Tiger", "ration" : { "water" : 25 } }, { "name" : "Monkey", "ration" : { "banana" : 15 } }, { "name" : "Monkey", "ration" : { "water" : 10 } }, { "name" : "Monkey", "ration" : { "nuts" : 1 } }, { "name" : "Panda", "ration" : { "bamboo" : 15 } }, { "name" : "Panda", "ration" : { "dumplings" : 50 } }, { "name" : "Panda", "ration" : { "water" : 3 } } ], "ok" : 1 } 


Next, we filter the sample only for those fields where there is a field ration.meat

 db.zoo.aggregate({$project: {name: 1, _id: 0, ration: 1}}, {$unwind: "$ration"}, {$match: {"ration.meat": {$exists: true}}}); 

 { "result" : [ { "name" : "Lion", "ration" : { "meat" : 20 } }, { "name" : "Tiger", "ration" : { "meat" : 15 } } ], "ok" : 1 } 


And the final conclusion is only the name of the predator.

 db.zoo.aggregate({$project: {name: 1, _id: 0, ration: 1}}, {$unwind: "$ration"}, {$match: {"ration.meat": {$exists: true}}}, {$project: {name: 1, _id: 0}}); 

 { "result" : [ { "name" : "Lion" }, { "name" : "Tiger" } ], "ok" : 1 } 


On which days at least one beast rests?


To do this, we “break” the holidays array into the whole array of animals (the panda is usually available to everyone and always).

 db.zoo.aggregate({$project: {name: 1, holidays: 1}}, {$unwind: "$holidays"}); 

 { "result" : [ { "_id" : ObjectId("4f58b7f627f86b11258dc70c"), "name" : "Lion", "holidays" : 1 }, { "_id" : ObjectId("4f58b7f627f86b11258dc70c"), "name" : "Lion", "holidays" : 4 }, { "_id" : ObjectId("4f58b86027f86b11258dc70d"), "name" : "Tiger", "holidays" : 6 }, { "_id" : ObjectId("4f58b90c27f86b11258dc70e"), "name" : "Monkey", "holidays" : 2 }, { "_id" : ObjectId("4f58b98727f86b11258dc70f"), "name" : "Panda" } ], "ok" : 1 } 


And filter only those where the holidays field is a number greater than -1 (well, or 0, to whom it is more convenient)

 db.zoo.aggregate({$project: {name: 1, holidays: 1}}, {$unwind: "$holidays"},{$match: {holidays : {$gt: -1}}}); 

 { "result" : [ { "_id" : ObjectId("4f58b7f627f86b11258dc70c"), "name" : "Lion", "holidays" : 1 }, { "_id" : ObjectId("4f58b7f627f86b11258dc70c"), "name" : "Lion", "holidays" : 4 }, { "_id" : ObjectId("4f58b86027f86b11258dc70d"), "name" : "Tiger", "holidays" : 6 }, { "_id" : ObjectId("4f58b90c27f86b11258dc70e"), "name" : "Monkey", "holidays" : 2 } ], "ok" : 1 } 


Remove all unnecessary.

 db.zoo.aggregate({$project: {name: 1, holidays: 1}}, {$unwind: "$holidays"},{$match: {holidays : {$gt: -1}}}, {$project: {holidays: 1, _id: 0}}); 

 { "result" : [ { "holidays" : 1 }, { "holidays" : 4 }, { "holidays" : 6 }, { "holidays" : 2 } ], "ok" : 1 } 


How many products a day must be purchased.


The most interesting, in my opinion, task. To implement it, remember that $ project can create fields and create a meat field with the value of the meat property.

 db.zoo.aggregate({$project: {ration: 1, _id: 0}}, {$unwind: "$ration"}, {$project: {ration: 1, meat: "$ration.meat", _id: 0}}); 


If this field in the properties of the diet of the animal is not, then it will not be created. Here is an example of a sample part:

 { "result" : [ { "ration" : { "meat" : 20 }, "meat" : 20 }, { "ration" : { "fish" : 1 } }, { "ration" : { "water" : 30 } }, ... } 


Let's do this for all types of food and remove the output of the object itself ration:

 db.zoo.aggregate({$project: {ration: 1}}, {$unwind: "$ration"}, {$project: {ration: 0, _id: 0, meat: "$ration.meat", fish: "$ration.fish", water: "$ration.water", banana: "$ration.banana", bamboo: "$ration.bamboo", nuts: "$ration.nuts", dumplings: "$ration.dumplings", _id: 0}}); 


as a result we get

 { "result" : [ { "_id" : ObjectId("4f58e58227f86b11258dc713"), "meat" : 20 }, { "_id" : ObjectId("4f58e58227f86b11258dc713"), "fish" : 1 }, { "_id" : ObjectId("4f58e58227f86b11258dc713"), "water" : 30 }, { "_id" : ObjectId("4f58e5e127f86b11258dc714"), "meat" : 15 }, { "_id" : ObjectId("4f58e5e127f86b11258dc714"), "water" : 25 }, { "_id" : ObjectId("4f58e60027f86b11258dc715"), "banana" : 15 }, { "_id" : ObjectId("4f58e60027f86b11258dc715"), "water" : 10 }, { "_id" : ObjectId("4f58e60027f86b11258dc715"), "nuts" : 1 }, { "_id" : ObjectId("4f58e64a27f86b11258dc716"), "bamboo" : 15 }, { "_id" : ObjectId("4f58e64a27f86b11258dc716"), "dumplings" : 50 }, { "_id" : ObjectId("4f58e64a27f86b11258dc716"), "water" : 3 } ], "ok" : 1 } 


It remains only to add / group the whole thing with the function $ group. Specifying the _id field in the grouping here is mandatory, but we don’t need it in principle, so let it be some nonsense. For each type of food, we create an appropriate field for summing up the individual rations of each animal:

 db.zoo.aggregate({$project: {ration: 1}}, {$unwind: "$ration"}, {$project: {ration: 0, _id: 0, meat: "$ration.meat", fish: "$ration.fish", water: "$ration.water", banana: "$ration.banana", bamboo: "$ration.bamboo", nuts: "$ration.nuts", dumplings: "$ration.dumplings"}}, {$group: {_id: "s", sum_meat: {$sum: "$meat"}, sum_fish: {$sum: "$fish"}, sum_water: {$sum: "$water"}, sum_banana: {$sum: "$banana"}, sum_nuts: {$sum: "$nuts"}, sum_bamboo: {$sum: "$bamboo"}, sum_dumplings: {$sum: "$dumplings"}}}); 

 { "result" : [ { "_id" : "s", "sum_meat" : 35, "sum_fish" : 1, "sum_water" : 68, "sum_banana" : 15, "sum_nuts" : 1, "sum_bamboo" : 15, "sum_dumplings" : 50 } ], "ok" : 1 } 


Favorite Caretaker


Filter by fields and unwind the staff.like array:

 db.zoo.aggregate({$project: {name: 1, _id: 0, "staff.like": 1}}, {$unwind: "$staff.like"}); 

Recall that $ project can raise the field to the next level:

 db.zoo.aggregate({$project: {name: 1, _id: 0, "staff.like": 1}}, {$unwind: "$staff.like"}, {$project: {_id: 0, name: "$staff.like"}}); 

Since we chose all caretakers who at least like someone and someone like two animals, he is present in the sample two times.

 { "result" : [ { "name" : "Petrovich" }, { "name" : "Mihalich" }, { "name" : "Petrovich" }, { "name" : "Maria" }, { "name" : "Anna" }, { "name" : "Petrovich" }, { "name" : "Mihalich" }, { "name" : "Maria" }, { "name" : "Anna" } ], "ok" : 1 } 


Now it is necessary to sum these fields. But this is not so easy to do, since we do not have a field for summation, so we create this field with the already known chip.

 db.zoo.aggregate({$project: {name: 1, _id: 0, "staff.like": 1}}, {$unwind: "$staff.like"}, {$project: {_id: 0, name: "$staff.like", count: {$add: [1]}}}); 

As a result, one more field count will be added to each object with a value of 1. Group and summarize:

 db.zoo.aggregate({$project: {name: 1, _id: 0, "staff.like": 1}}, {$unwind: "$staff.like"}, {$project: {_id: 0, name: "$staff.like", count: {$add: [1]}}}, {$group: {_id: "$name", num: {$sum: "$count"}}}); 

We sort and limit the output to the very first element.

 db.zoo.aggregate({$project: {name: 1, _id: 0, "staff.like": 1}}, {$unwind: "$staff.like"}, {$project: {_id: 0, name: "$staff.like", count: {$add: [1]}}}, {$group: {_id: "$name", num: {$sum: "$count"}}}, {$sort: {num: -1}}, {$limit: 1}); 

And we get the following:

 { "result" : [ { "_id" : "Petrovich", "num" : 3 } ], "ok" : 1 } 


That's all. For those interested, there are two simple reports in English on this topic: one and two .

To be honest, I really like MongoDB, although we used it only for a part of the project to store scattered data. The same Map / Reduce for me has always been something scary and incomprehensible, but a new piece of data aggregation allows you to partially eliminate JavaScript, because somehow it is an interpretable language, and therefore slow and replace it with ready-made, which means fast, elements of the language .

PS It is worth noting that version 2.1 is still quite raw. I constantly got any exceptions for assertion failed. But I think that in 2.2 it will finally be stable and cool.

Source: https://habr.com/ru/post/139643/


All Articles