MongoDB for Developers and DBA

MongoDB courses for developers and database architects from 10gen , the MongoDB developer company, are ending .
The final exam was sent for review and I would like to share my impressions of the course and the information received, to tell you about the pros and cons of MongoDB.

General impression of the course

I signed up for two courses at once, for DBA and for developers. In general, the load is not too big, it took 3-4 hours a week to watch the video and 1-2 hours to do homework at a very slow pace. If you wish, I think time costs can be reduced by a factor of two.
In general, the impressions are positive. Despite the fact that most of the information presented in the courses can be obtained from official documentation, it is much more interesting to master the material through the courses. I had heard about MongoDB before and “felt” a couple of times what it was, but after the courses a deeper understanding of the possibilities and scope of this database appears. At first there were some lining. A few points were related to the wording of questions that implied ambiguous answers, but perhaps this was a problem with understanding English. Then there were a couple of problems with nulling the results, and some overlaps due to Hurricane Sandy. There was a funny question that I called “Mission Impossible”, in which I had to choose one of three possible answers. At the same time to answer the question is usually given three attempts.

Strengths

Replication ( manual )

Replication is easy to set up: servers are started with the name of the replica set up, a config is configured, replica members choose a PRIMARY server, the rest become SECONDARY servers. In this case, you can adjust the priorities for each server, you can generally prevent the server from becoming PRIMARY. PRIMARY server allows write-read operations; you can only read from SECONDARY servers.
When a server crashes, there are many nuances, depending on whether the SECONDARY or PRIMARY server fell, whether there was an entry in the PRIMARY server after it became unavailable and a new PRIMARY server was selected, etc. But in general, in most cases, the recovery of the replica is automated and does not require manual intervention, except to raise the fallen server.
')

Sharding ( manual )

If you have a large amount of data in the collection and it does not fit into one server, it can be split into several servers, while working with this collection at the application level will not require changes. For sharding, a key is selected, using this key, ranges are compiled for each shard. Moreover, if I remember correctly, when the ranges are changed, resharing occurs automatically in hot mode. In this case, there is a nuance, after the collection has been spread across the servers, it is impossible to change the key for sharding in automatic mode.

Geographical indices ( manual )

Now many startups or soc. networks use the following functionality: to find something not more than X km from the user. Here for such functionality in MongoDB you can use geographic indices.

Schemaless

In my opinion the lack of a scheme in MongoDB allows you to speed up the development of the project. It is not necessary to work out the database structure in detail at the initial stage, to take care of the implementation of the links, then when the project expands, it’s detailed - to develop a data migration plan, etc. In MongoDB, on the contrary, at the “prototype” stage, you can quickly launch a project, arranging “confusion and vacillation” in the database, then when the project starts to grow, it will become something more tangible and you will need to bring the database into a more normalized form.

Capped collections ( manual )

When creating such collections, you must specify the number of records that can be stored in the collection. When inserted, if the collection is already full, the oldest entry will be reset. You can compare with the record clockwise on a circle in which there is a certain number of segments. Useful collection, if you need to keep the latest, up-to-date information, and the old information is not interested.

Aggregation framework ( manual )

Using this framework, it is possible to form samples from the source data with grouping, summing, counting records, etc. In essence, this is the implementation of GROUP BY, COUNT, HAVING, etc. constructs in SQL. The source data passes through an array of so-called pipe, which convert the data and give it to the next pipe. Very similar to the console commands of the form: "cat file | grep boobs | grep -v small ".

Map-Reduce ( manual )

If the capabilities of the Aggregation Framework are not enough, you can use the MapReduce functionality. The functions are fed into the map function input, converted and fed into the reduce function input.

Underwater rocks

Restriction of compound indexes

If you have an entry of the form: {a: [1, 2], b: [1, 2]} - create index {a: 1, b: 1} will not work. Actually, as well as insert a similar entry with the fields that are indexed. Read more here , look for “Compound Multikey Indexes May Only Include One Array Field”

Sparse index and uniqueness ( manual , Sparse Indexes)

Suppose we have entries in the collection:
{"_id": ObjectId ("50caeec479705c3852e9e61b"), "a": "1"}
{"_id": ObjectId ("50caeeeb79705c3852e9e61d"), "a": "2", 'b': 1}
{"_id": ObjectId ("50caefb179705c3852e9e621"), "a": "3"}
and we want the documents property “b” to be unique. Create a regular unique index does not work, the first and third entry will be considered that b: null and it violates the uniqueness.
But we can create a unique sparse index, and then entries that do not have a “b” will not be included in this index. It would seem that everything was fine, the index was created, there is uniqueness. But! If we say we ask you to select all the records from our collection and ask them to sort by field b, MongoDB uses the index created by us sparse, in which there are no records without “b”. As a result, we will get only one entry at the output.

Application Interface Dependency

It was repeatedly noted in the course that it is convenient to store documents in MongoDB, as they are used for output. Suppose you have a blog, there are comments. Next to the comment is the name of the author and email. Conveniently in the object that stores the comment also store information about the author. Accordingly, if you change something in this regard, there is a possibility that it will be necessary to change the location of the data storage. In principle, this is not quite a caveat and the likelihood of such a development is small, but I did not particularly like something in this statement.

Cannot change shard key

After the collection has been posted, changing the key will not automatically work. Therefore, key selection is a very important operation. Learn more docs.mongodb.org/manual/faq/sharding/#faq-change-shard-key

No transactions within multiple documents

The operation on a single document is atomic, but for several documents it is proposed to use transactions at the application level docs.mongodb.org/manual/tutorial/perform-two-phase-commits

Conclusion

Great courses. MongoDB impresses with its flexibility and simplicity, its use in various situations is very, very justified.
If anyone has a desire to take courses, on January 21 the courses will start again. Also on February 25, courses for Java developers will start. https://education.10gen.com/

Source: https://habr.com/ru/post/162603/

All Articles