Pros and cons: When it is worth and not worth using MongoDB

The developer and collaborator of the CouldBoost.io project, Nawaz Dhandala, wrote a paper on why in some cases you should not use MongoDB. We at Later develop billing for telecom operators Hydra and have been working with this database for many years, so we decided to provide our opinion on this issue.

Dandala immediately states that he worked with many DBMSs (both SQL and NoSQL) and considers MongoDB to be an excellent tool, however, there are scenarios in which its use is impractical.
')
Document-oriented DBMS, such as MongoDB, do an excellent job with storing JSON data, grouped in “collections.” In this format, you can store any JSON-documents and conveniently categorized and by collections. The JSON document contained in MongoDB is called binary JSON or BSON and, like any other document of this format, is unstructured. Therefore, unlike traditional DBMS, any kind of data can be stored in collections, and this flexibility is combined with the horizontal scalability of the database. This opportunity is liked by many developers, but "not everything is so simple."

If MongoDB is so cool, why doesn't everyone use it and always?

The choice of a DBMS depends, among other things, on what application you plan to create. That is, the database is not chosen by the developers, but the product itself, Dandala is convinced. He gives an example confirming this thesis.

When creating an application whose concept involves working with documents, MongoDB would be a good choice. These types of applications include, for example, the blogging engine, where each author can have several blogs, and each of them will contain many comments. The database for servicing such an application should be easily extensible, and here MongoDB fits perfectly.

However, it should be noted that MongoDB does not have links between documents and “collections” (this is partially compensated by the Database Reference - references in the DBMS, but this does not completely solve the problem). As a result, a situation arises in which there is a certain set of data that is not related to other information in the database, and there is no way to combine data from various documents. In SQL systems, this would be an elementary task.

Here another question arises - if there are no links and opportunities in MongoDB to join two tables, then why use it at all? The answer is because this DBMS is highly scalable and, compared to traditional SQL systems, reads and writes much faster. MongoDB is perfect for applications that do not use data with dependencies and require database scalability.

Many developers use MongoDB for storing related data, implementing manual joins in code — this is enough in “one-tier” merge scenarios or a small number of connections. That is, this method is far from universal.

So which DBMS to choose?

There is a huge number of different DBMSs, and each of them meets a certain set of requirements that developers impose on their application:

Document-oriented DBMS (for example, MongoDB) : As already mentioned above, document-oriented DBMS is used to store JSON documents in “collections” and to make queries on the required fields. You can use this database to create applications that will not contain too many links. A good example of such an application is a blogging engine or storing a product catalog.
Graph DBMS (for example, Neo4j) : Graph DBMS is used for storage between subjects, where nodes are subjects, and faces are connections. For example, if developers create a social network, and one user subscribes to another, then users are nodes, and their “subscription” is a link. Such databases perfectly cope with the formation of connections, even if the depth of such connections is more than one hundred levels. This tool is so effective that it can even detect e-commerce fraud.
Cache (for example Redis) : Such DBMS are used when extremely fast access to data is required. If you create an application for online trading, in which there are categories loaded on each page, then instead of accessing the database for each reading, which is extremely expensive, you can store data in the cache. It allows you to quickly perform read / write operations. Dandala advises using a cache-based DBMS as a wrapper for processing frequently requested data, eliminating the need to make frequent requests to the database itself.
Search engines (for example, ElasticSearch) : If you need to perform a full-text database search (for example, product search in an ecommerce application), a good idea would be to use a search engine like ElasticSearch. This system is able to search by a huge array of data and has extensive functionality - for example, the DBMS is able to search by named categories.
String DBMS (for example, Cassandra) : The Cassandra DBMS is used to store serial data, logs, or a huge amount of information that can be automatically generated — for example, by some sensors. If the developers are going to use the DBMS to write large amounts of data and it is planned that there will be much fewer reads and the data will not have connections and associations, then Cassandra will be a good choice, Dandala is sure.

Using a combination of databases

There are also situations in which you may need to use several different DBMS at once.

For example, if the application has a search function, then it can be implemented using ElasticSearch, and MongoDB is better suited for storing data without connections. If we are talking about a project in the field of the Internet of Things, where a huge number of various devices and sensors generate huge amounts of data, it would be reasonable to use Cassandra.

The principle of using multiple DBMS to work in one application is called “Polyglot Persistence”. In this article you can read about the pros and cons of this approach.

Our experience

Our Hydra billing system uses a relational database management system for recording primary data and storing financial information. It is perfect for this purpose. But some Hydra modules, for example, a RADIUS server, work under high load and can receive thousands of requests per second with severe restrictions on the processing time of the request. In addition, in the database of our autonomous RADIUS server, the data is stored as an AVP set (attribute / value pair). In such a scenario, a relational DBMS no longer looks like the best solution, and then MongoDB comes to the rescue with its repository of documents of arbitrary structure, quick response and horizontal scalability.

When operating on more than 100 Hydra installations over the past 5 years, we have not found any serious problems with Mongo. But a couple of nuances are still there. First, after a sudden shutdown of the server, the database, although it recovers from the log, but it happens slowly. Fortunately, the need for this occurs infrequently. Secondly, even with a small database size, rarely used data is flushed to disk, and when a request is received, it takes a long time to retrieve it. As a result, the restrictions on the execution time of the request are violated.

All of this refers to the MMAPv1 engine, which Mongo uses by default. We have not yet experimented with others (WiredTiger and InMemory) - the problems are not so serious.

Source: https://habr.com/ru/post/280196/

All Articles

Pros and cons: When it is worth and not worth using MongoDB

If MongoDB is so cool, why doesn't everyone use it and always?

So which DBMS to choose?

Using a combination of databases

Our experience

More articles: