
Once I ran into a task: mongoDb was used as a cache / buffer between the backend in Java and the frontend on node.js. Everything was fine, until a business requirement appeared to transfer large volumes in a short time through mongoDb (up to 200 thousand records in no more than a couple of minutes). What is not so important, it is important that such a task appeared. And here it was necessary to understand the guts of Mongi ...
Round 0:Just write in Mongu with
Write Concern = Acknowledged. The most banal and easy way to the forehead. In this case, Mongo guarantees that everything has been recorded without errors and in general everything will be fine. Everything is perfectly written, but ... at 200 thousand dies for twenty or more minutes. Does not fit. Cross-stitch way.
')
Round 1:We try
Bulk write operations with the same
Write Concern = Acknowledged. It became better, but not much. Writes ten to fifteen minutes. Strange, in fact, greater acceleration was expected. Okay, go ahead.
Round 2:We try to change
Write Concern to Unacknowledged and use
Bulk write operations until the heap. In general, this is not the best solution, since if something goes wrong in Mong, we will never know about it, since it will only report that the data reached it, but whether it is recorded in the database or not is unknown. On the other hand, according to the business requirements, the data are not banking transactions, a single loss is not so critical, and if everything is bad in a mongee, we will already learn from the monitoring. We try. On the one hand, it was recorded in just a minute, which is good (without Bulk write operations, a minute and a half is also quite good), on the other hand, a problem arose: immediately after writing java gives node.js a go-ahead, and when it starts to read, the data comes completely or not at all , then half is read, half is not. The fault is asynchronous - with this Write Concern, Monga still writes, and node.js already reads, respectively, the client manages to read before the record is guaranteed to end. Poorly.
Round 3:We started to think, the idea to write Thread.sleep (60 seconds) or to write some control object to Mong, which showed that all data was loaded, looks very crooked. We decided to see why Bulk write operations accelerate so badly, because in theory Write Concern should slow down the last entry during Bulk write operations, and not everything. Somehow it is illogical that waiting for the recording of the last portion takes so much time. We are looking at the driver code for Mongi in Java, stumble upon that packages of bulk operations are limited to a certain parameter maxBatchWriteSize. Debug shows that this parameter is only 500, that is, in fact, the whole bulk is cut by requests of only 500 records, and therefore such results, Acknowledged every time waits for the full record of these 500 records before sending a new request and so four thousand times maximum volume, and it wildly slows down.
Round 4Trying to understand where this maxBatchWriteSize parameter comes from, we find that the driver of the monga makes a getMaxWriteBatchSize () request to the server of the monga. There was a thought to increase this parameter in the Mongi config and bypass this restriction. Attempts to find this parameter or query in the specification gave a zero result. All right, we search in the Internet, we find source codes on C ++. This parameter is a banal constant, hard-wired in the source code, that is, it cannot be increased. Dead end.
Round 5We are looking for more options in the internet. We decided not to try the option of uploading through hundreds of parallel streams, it’s banal to have your own server with Monga for DDos (especially since Monga itself can parallel incoming requests). And then we found such a command as
getLastError , the essence of it is to wait until all operations are saved to the database and return an error code or a successful completion. The specification reinforces trying to convince that the method is outdated and should not be used; in the Monga driver, it is marked as depricated. But we try sending requests with
Write Concern = Unacknowledged and
Bulk write in ordered mode, and then we call
getLastError () and yes, in one and a half minutes we recorded all the records synchronously, now the client starts reading exactly after the complete record of all objects, as getLastError () waits end of the last record, while the packages do not inhibit each other. In addition, if an error occurs, we will know about it with getLastError (). That is, we received exactly fast Bulk write with Acknowledged, but waiting for only the last packet (or almost error handling will probably be worse than the real Acknowledged mode, probably this command will not show the error that occurred only in the first packets, on the other hand the probability that The first package will fall with an error, and the last one will be successful - not so great).

So, what is the Mongi specification for:
1.
Bulk write operation is not very bulk and is strictly limited to a ceiling of 500-1000 requests in a packet.
Update : in fact, as I have now discovered, the mention of the ceiling in 1000 operations
appeared after all, there was no mention of the magic constant
more than a year ago in version 2.4 , when the analysis was performed,
2. Alas, the mechanism with
getLastError was somewhat more successful and the new Write Concern mechanism does not completely replace it yet, or rather you can use an outdated command to speed up work, so the logical behavior is “wait for the successful recording of only the last package from the large ordered bulk request "In the monge is not implemented,
3. The Write Concern = Unacknowledged problem is not even that the data can be lost and the error is not returned, but the fact that the data is written asynchronously and the client’s attempt to immediately access the data can easily lead to the fact that he does not receive data or only a part of them (it is important if the read command is given immediately after the recording).
4. In Mongi, query performance suffers from such a limited bulk, and the Acknowledged Write Concern is implemented not quite correctly, it is correct to wait until the end of the recording of the last packet.
PS In general, it turned out to be an interesting optimization experience using non-standard methods, when the specs do not contain all the information.
PPS I also advise you to look at my opensource project [useful-java-links] (https://github.com/Vedenin/useful-java-links/tree/master/link-rus) - perhaps the most comprehensive collection of useful Java libraries, frameworks and Russian-language instructional videos. There is also a similar [English version] (https://github.com/Vedenin/useful-java-links/) of this project and start the opensource subproject [Hello world] (https://github.com/Vedenin/useful-java -links / tree / master / helloworlds) for preparing a collection of simple examples for different Java libraries in one maven project (I will be grateful for any help).