I watched videos from the Mongo Moscow conference . Here are the theses about the possibilities left over from what was viewed. The translation was not so hot, looked at English, could make a mess of something. I tried to be objective, if it didn't work out - write :)
The data in MongoDB is a list of JSON hashes, with nested arrays and hashes.
A document is one such hash.
Therefore, there is no ALTER TABLE - the record structures are independent of each other.
Collations are now poorly supported, UTF is good.
Hit the syntax in some places: In MongoDB, write 'db.schema.find ({"s", {$ gt: "a"}}) "instead of' where s>" a '.
No transactions. But operations with documents are guaranteed to be atomic.
When you call getLastError, it checks to see if an error has occurred. In safe mode (in many drivers is the default) is called after each entry. You can call less often - less overhead, but in case of an error you will have to rewrite a stack of records immediately.
getLastError can not be expected from all servers, but only from a few.
They propose to create documents rather complicated (for example, one document is a blog with nested posts, each with a nested comment tree). This is a bad idea - the load on one element can become critically large.
No JOIN. Just create a data structure so that you do not need it. Mice, become hedgehogs.
There is replication, replication log (I do not know, binary or command).
If the replication master has died, a new one will be selected among the slaves in 10–20 s.
There are shards. By ranges, not by hashes. Ranges for each shard are set in the config.
Resharing is carried out by adjusting these ranges. At the same time, records for variable shards are blocked for the duration of the transfer.
The data in the stack is divided into chunks of approximately 200M each.
There is a MapReduce interface, but only a glimpse is mentioned about it.
There are spatial indexes.
There is mmapped storage with customizable fsync frequency.
It seemed that they only kept the indexes in memory, but perhaps this was not the case. I would not want to.
Indices are some trees similar to B-tree. It is better to periodically rebuild them, but since version 1.8, it is quite rare.
Interesting - in the query filters, you can set the condition $ where and write the filter condition in JS in it.
True, the indexes will not be used (still).
There are sparse indexes - if the attribute is specified only for a small number of documents, only they will get into the index.
Covered indexes are IOT, i.e. The index contains other attributes so that the query can return results using only the data from the index, without access to the chunks with the documents.
One entry in the index - about 40 bytes overhead.
Empirical query optimizer that alarming.
Optimizes the ratio of index scan to the number of returned documents.
In any case, the used index can be specified in the hint.