I recently learned that
Red Hat removes MongoDB support from the Satellite (they say, because of license changes). It made me think that in the past few years I have seen a bunch of articles about how terrible MongoDB is and that no one should ever use it. But during this time, MongoDB has become a much more mature product. What happened? Is all hate really due to mistakes at the start of marketing a new DBMS? Or do people just use MongoDB in the wrong place?
If you suddenly think that I am protecting MongoDB, please read the
disclaimer at the end of the article.
New trend
I have been working in the software industry for more than half a year than to speak well, but all the same, only a small part of the trends that hit our industry fell on my share. I witnessed the growth of 4GL, AOP, Agile, SOA, Web 2.0, AJAX, blockchain ... the list is endless. Every year new trends appear. Some quickly fade away, while others fundamentally change the way software is developed.
Some general excitement is created around each new trend: people either jump into the boat themselves, or see the noise generated by others - and follow the crowd. This process is codified by Gartner in
the HYIP cycle . Although controversial, this chart roughly describes what happens to technologies before they eventually become useful for use.
')
But from time to time a new innovation appears (or happens the second coming, as in this case), driven by only one specific implementation. In the case of NoSQL, HYIP was strongly driven by the emergence and rapid rise of MongoDB. MongoDB did not launch this trend: in fact, large Internet companies started having problems with processing large amounts of data, which led to the return of non-relational databases. The general movement started with projects such as Google's Bigtable and Facebook's Cassandra, but it was MongoDB that became the most famous and accessible implementation of the NoSQL database, which most developers had access to.
Note: you might think that I mix document databases with column databases, keys / values ​​stores, or any of the many other types of data stores that fall under the general definition of NoSQL. And you are right. But at that time chaos reigned. All were obsessed with NoSQL, it became absolutely necessary for everyone, although many did not see differences in different technologies. For many, MongoDB has become synonymous with NoSQL.
And the developers pounced on her. The idea of ​​a database without a schema that magically scales to solve any problem was quite tempting. Around 2014, it seemed that everywhere where a relational database such as MySQL, Postgres or SQL Server was used a year ago, MongoDB databases were deployed. When asked why, you could get a response from the banal “this is the scale of the web” to the more thoughtful “my data is very weakly structured and fits well into the database without a schema”.
It is important to remember that MongoDB and document databases in general solve a number of problems with traditional relational databases:
- A strict scheme : with a relational database, if you have dynamically generated data, you have to either create a bunch of random “different” data columns, cram data blocks there, or use an EAV configuration ... all this has significant drawbacks.
- The difficulty of scaling : if the data is so much that they do not fit on a single server, MongoDB offered mechanisms to scale it on several machines.
- Complex schema modifications : no migrations! In a relational database, changing the structure of the database can be a huge problem (especially when there is a lot of data). MongoDB was able to greatly simplify the process. And it made it so easy that you can just update the scheme on the go and move on very quickly.
- Write performance: MongoDB performance was good, especially with proper setup. Even the MongoDB configuration out of the box, for which it was often criticized, showed some impressive performance indicators.
All risks are on you
The potential benefits of MongoDB were enormous, especially for certain classes of problems. If you read the above list without understanding the context and having no experience, you might get the impression that MongoDB is a truly revolutionary DBMS. The only problem was that the advantages listed above were accompanied by a number of reservations, some of which are listed below.
To be fair, nobody in 10gen / MongoDB Inc. It will not say that the following is not true, it is just a compromise.
- Loss of transactions : transactions are the main feature of many relational databases (not all, but most). Transactionalism means that you can perform multiple operations atomically and can guarantee that the data will remain consistent. Of course, with the NoSQL database, transactionality can be within a single document, or you can use two-phase commits to get transactional semantics. But you have to implement this functionality yourself ... which can be a daunting and time consuming task. Often you are not aware of the problem until you see that the data in the database is in unacceptable conditions, because it is impossible to guarantee the atomic nature of the operations. Note: many have informed me that last year transactions appeared in MongoDB 4.0, but with a number of restrictions. The conclusion from the article remains the same: evaluate how the technology meets your needs.
- Loss of relational integrity (foreign keys) : if your data has relationships, then you have to apply them in the application. The presence of the database in compliance with these relations will remove a significant part of the work from the application and, therefore, from your programmers.
- The inability to use the data structure : strict schemes sometimes become a big problem, but it is also a powerful mechanism for good data structuring, if used correctly. Document databases, such as MongoDB, provide incredible circuit flexibility, but this flexibility relieves the responsibility of keeping data clean. If you do not take care of them, then ultimately you will have to write in the application a lot of code for accounting data that is not stored in the form that you expect. As they often say in our company Simple Thread ... the application will ever be rewritten, and the data will live forever. Note: MongoDB supports schema validation: it is useful, but does not provide the same guarantees as in a relational database. First of all, adding or modifying a schema check does not affect the existing data in the collection. You yourself must make sure that you update the data in accordance with the new scheme. Decide for yourself whether this is enough for your needs.
- Own query language / loss of ecosystem tools : the emergence of SQL has become an absolute revolution, and since then nothing has changed. This is an incredibly powerful language, but also quite complex. The need to construct queries to the database in a new language, consisting of JSON fragments, is regarded as a big step back by people who have experience with SQL. There is a whole universe of tools that interact with SQL databases: from IDE to reporting tools. Moving to a database that does not support SQL means that you cannot use most of these tools, or you need to convert data to SQL in order to use it, which may be more difficult than you think.
Many developers who turned to MongoDB did not really understand the trade-offs, and often dive headlong, setting it as the main data repository. After this, it was often incredibly difficult to go back.
What could have been done differently?
Not everyone jumped headfirst and hit the bottom. But many projects installed the MongoDB base where it simply did not fit - and they would have to live with it for many more years. If these organizations spent some time and methodically thought about the choice of technologies, many would make a different choice.
How to choose the right technology? There have been several attempts to create a systematic framework for assessing technologies, such as
the Framework for Implementing Technologies in Software Organizations and
the Framework for Assessing Software Technologies , but it seems to me that this is unnecessary complexity.
Many technologies can be reasonably assessed by asking just two basic questions.
The problem is finding people who can respond to them responsibly, taking the time to search for answers and without bias.
If you do not face any problem, you do not need a new tool. Point.
Question 1: What problems am I trying to solve?
If you do not face any problem, you do not need a new tool. Point. No need to look for a solution, and then come up with a problem. If you are not faced with a problem that the new technology does not solve much better than your existing technology, then there is nothing to discuss. If you are considering the possibility of using this technology, because you have seen how others use it, then think about the problems they face and ask if you have such problems. It is easy to accept technology because it is used by others, the difficulty is in understanding whether you are facing the same problems.
Question 2: What am I losing?
This is certainly a more difficult question, because you have to dig in and understand well both the old and the new technology. Sometimes you can’t really understand a new one until you build something with it or you don’t have an employee with that experience.
If you do not have either one or the other, then it makes sense to think about the lowest possible investment to determine the value of this tool. And if you make an investment, how difficult will it be to reverse the decision?
People always spoil everything.
Trying to answer these questions as impartially as possible, remember one thing: you have to fight with human nature. There are a number of cognitive distortions that must be overcome in order to effectively evaluate technology. Here are just a few:
- The effect of joining the majority - everyone knows about him, but still it’s hard to fight. Just make sure the technology truly meets your real needs.
- The effect of novelty - many developers tend to underestimate the technology with which they worked for a long time, and overestimate the benefits of the new technology. Not only programmers, all subject to this cognitive distortion.
- The effect of positive characteristics - we tend to see what is, and lose sight of what is missing. This can lead to chaos, combined with the effect of novelty, because you not only in essence overestimate the new technology, but also ignore its flaws .
Objective assessment is not easy, but an understanding of basic cognitive distortions will help make more rational decisions.
Summary
When some kind of innovation appears, you need to be very careful in answering two questions:
- Does this tool solve a real problem?
- Do we understand the tradeoffs well?
If you cannot confidently answer these two questions, take a few steps back and think.
So was la MongoDB generally the right choice? Of course yes; As with most engineering technologies, it depends on many factors. Among those who answered these two questions, many have benefited from MongoDB and continue to extract it. Those who did not, I hope, received a valuable and not too painful lesson about moving along the HYIP cycle.
Disclaimer
I want to clarify that I have no love or hatred for MongoDB. We just didn’t have such problems for which MongoDB is best suited. I know that 10gen / MongoDB Inc. At first, I acted very boldly, setting unsafe default values ​​and promoting MongoDB everywhere (especially on hackathons) as a universal solution for working with any data. It was probably a bad decision. But it confirms the approach described here: these problems could be detected very quickly even with a superficial assessment of the technology.