SQL is flexible or why I’m afraid of NoSQL

From the translator: Recently I presented one project in Habré in which I used MySQL. Many users wondered why I did not use NoSQL for my tasks, and strongly recommended switching to non-relational databases. Today I stumbled upon this article, which perfectly explains why I am “afraid” of NoSQL.

I must confess that I have been thinking for a long time about whether to write an essay on databases, because touching this topic is the same as stirring a wasp nest. Firstly, much has been written before me, and secondly, the topic is too complex to draw any conclusions from personal experience.

The last two weeks, however, made me realize that I will never start a project based on MongoDB or any other non-relational database (NRBD) as the primary data warehouse. Pay attention - I said “I'll start.” I'm not saying that I will never use MongoDB as such.

Before I begin to explain, I will retreat to make sure that we are on the same wavelength. First, my main experience with non-relational data warehouses comes down to three rather different sides of this technology: MongoDB, HBase and Redis. It is very likely that what I attribute to the inalienable benefits of SQL can be excellently done by other NRBDs. The reason for writing this post is because I meet a lot of people who claim that NoSQL is much simpler to create prototypes than SQL. And with this, I disagree somewhat.

The circuits are great

It seems that many people are terribly afraid of schema concepts. Scheme == explicit typing == killer labor productivity. The problem with this statement is that it is based on the idea that the scheme can be completely eliminated. Unfortunately, this idea is a pipe dream. At some point, you should have an understanding of what you are dealing with: you need to know what type of current value is. If we do not want to describe types in programming, we have several options. For example, we can analyze the code in order to understand what type a variable will be at a certain point in time, as is done in the Rust programming language. Or we can resort to what Python or JavaScript does: switch to dynamic typing and solve the problem at runtime.
')
This problem is much more significant in databases, because we are not talking about individual variables, but about entire collections. You view your NRBD as an entity into which you throw JSON documents. But even without announcing the scheme you keep it in mind if you work with collections of objects. Assuming that you find the total number of comments in your blog, by counting the number of comments for each post, you use the scheme. You expect that the number of comments for each post is an integer, the sum of which must be added so that the operation makes sense.

The scheme in any case lives in your head. The problem is that it does not apply properly and there is a chance of making mistakes. If you store the string where you saved an integer number in another collection, there will be no problems as long as you reflect on the same document. Otherwise, your collections suddenly become inconsistent.

It turns out that even if you clearly do not use the scheme, your NRBD still use some of its residues for indexing. Although people can tell you, NRBD is not magic. They work under the same restrictions as any other data warehouse, and need to be explicitly indexed to be fast.

I definitely love the schemes. But I did not know how much I love them, until they suddenly disappeared. Everything got dirty, and the first thing I added to our code is the typing system to get the schema.

Do you know what you want?

The lack of a scheme should not be a reason not to use MongoDB or any other NRBD. It's easy to add a schema as an add-on, and things like MongoEngine for Python already do it for you. Therefore, I’m not so much concerned with the problem of lack of schemes, as with the all-encompassing idea of denormalized data.

Denormalized data is cool, without a doubt. They can be quickly and easily understood, and denormalization in several non-relational repositories is much easier than in SQL, because you can easily put the data where necessary. This will work as long as you know what you are doing. Unfortunately, sometimes the unexpected happens and then you realize that you have data that you do not have access to. This is what happened to me several times this week.

For reasons too complex to explain here, I had to extract information that was not presented in the format in which I needed. Everything was so surprisingly well denormalized that attempts to obtain data turned out to be a bloody mess. From a SQL-oriented database, this could be done with the help of several JOINs and GROUP BY. And all because it was written without guessing that I would need data in a completely different format.

Killed “Too complicated”

Over the past month I have learned for myself that the great value is in quickly adapting to changes. We were happy to present our game on iTunes for Christmas in several countries. Suddenly, we got a lot more players than one would expect. We needed answers to questions that we didn’t expect to ask, and our data model makes it difficult to get answers.

I like to believe that our team is quite capable, and we choose technology with great care, but there are always things that you miss.

MongoDB and other NRBDs are amazing if you already know everything. There are no surprises. All you need is to scale a successful product. If your product is new, the big value is to be flexible and be able to get information from the data quickly, even if you didn’t expect this to happen.

SQL databases make this a relatively simple task and do not make you suffer for the lack of proper planning. If queries become slow, you can simply add indexes in the places where it is needed. In some NRBD this will not work. In Redis, you create the data structures manually, and if they are initially incorrect, you have a problem. In MongoDB, the presence or absence of indexing changes the sampling results sometimes, and can make adding indexes a nontrivial task.

Even worse, sometimes you just can't add indexing. If you have nested objects, it is difficult to simply request some of them, since they are contained in other objects. Since the database does not provide queries for such needs, it is not difficult to find yourself in a situation where you are forced to extract all the data for offline processing through MapReduce.

This is not even a problem, it is just part of the database architecture. However, we ran into this as a problem. Unfortunately, sometimes the world does not behave as it should. You encounter unforeseen situations, and there is nothing more tedious than sitting on your data and not being able to get information when you know that it is there.

So, at least for myself, I realized that I could not predict everything in advance, so as long as I have the opportunity, I will use more flexible data stores, and at the moment this means that I will use SQL databases.

And I still love you MongoDB.

Source: https://habr.com/ru/post/164361/

All Articles

SQL is flexible or why I’m afraid of NoSQL

The circuits are great

Do you know what you want?

Killed “Too complicated”

More articles: