This article is for you if you:
- choose a database for a new project and study information about different options;
- consider that the current database does not suit you for some parameters and you want to change it, but you do not have a good specialist;
- just want to read in one article about several databases and when you can use them.
My article is not for you if you:
- well know how to prepare your favorite database, optimize indexes, set up and all that;
- You have a good specialist on staff who can reasonably choose the option that your project needs. the specialist must be really good, not an “expert on the couch”;
- you are servicing a project with the so-called “big data”, that is, you have huge databases, dozens or hundreds of servers in a cluster and all that — well, you should have one or several specialists on staff.
What will be discussed in the article?
I will analyze in my article some typical and not so many options for choosing databases, or, to be more precise, approaches to choosing. When should stop on the fact that they use the majority, and when you can and think about the new and unknown. I will describe MySQL, PostgreSQL, MongoDB, Redis, CouchDB / PouchDB and mention Aerospike with Tarantool, a couple of others - but in some moments a particular choice is not so important. It is necessary to understand that it is better to initially design the data structure correctly than to choose a DBMS, and then try to invent what is actually stored in it.
So, let's begin.
Small digressions:
- My calculations are not the ultimate truth. I made some conclusions on the basis of real use, some on the basis of information on the Internet, discussions with other people. Some conclusions are made on the basis of real problems, and some in purely theoretical ideas. All this information is in one form or another on the Internet, but rather scattered, but I will try to compactly describe in one place.
- I consider only those products that are actively developing at the time of this writing and at the same time they have been stably existing for a long time. Again, in my opinion, one of the conditions for choosing something for your project is that the product is fairly stable, working, has a large and well-developed community, its use does not involve multi-day dances with a tambourine, etc. And what is important - it should be possible to officially take advantage of commercial support and thereby obtain guarantees from the developer. It should be borne in mind that there are many other options, but they are either raw and additional and irrational efforts will have to be made to support them in the project. Either much less successful or popular.
- I am pleased to read your opinion in the comments. And about the described databases and not only. And not only I, I think it will be interesting to many to read a reasoned opinion about any database.
- The most important and once again - there will not be a speech about large projects. In such projects, there is usually already an architect or knowledgeable person and enough funds to ensure the optimal choice. Although who knows, maybe my calculations and they will be useful.
Now let's ask ourselves the questions:
- How conservative are you, do you want and do you like to delve into something new?
- Do you want not to think or, on the contrary, want to go deep into the database device?
- How good are the programmers in your team, will they be able to competently create the database structure, or are they already excellent specialists in any one DBMS?
Have you answered? Right? Then I will list the DBMS and describe in which case it is recommended to pay attention to them.
MySQL / MariaDBPeople's DBMS or “must have” is on almost any hosting. Easy to install, works fine without any special settings. With the right approach can be flexibly adjusted to your needs. But there are pitfalls, in some cases it will be thus a narrow neck and your project will slow down, no matter how you tune the DBMS and the data structure.
MySQL for you if:
')
- you are a conservative and do not want to delve into the settings of the DBMS, just put it and it works (well, or on the hosting use what you give);
- you are a conservative, then you think structurally, in a table. MySQL will cope;
- in any programming language, framework, CMS, CMF and so on and so forth - there is integration with MySQL.
- you are new and you need a DBMS for managing structural data, preferably small (up to 1 - 2 gigabytes).
Minuses? They are there and you should choose another DBMS if you see something important.
- mediocre performance. Really low, as not tuning. even the cluster will not help much (it is necessary to set it up, yeah, those are still dancing with a tambourine). We are talking about numbers about 20 MBytes / sec. From personal experience, on SSD disks with such a stream, MySQL rested on its limit, could not cope and slowed down, and the service was tuned for several years, settings that were optimal for the load were used. Out of the box configuration, I think it will have even less of a bar;
- changing the data structure can be quite time-consuming process, especially with a large number of relationships between the data in different tables and even with the simple addition of fields;
- sensitivity to server instability. This is especially true when using XtraDB from Percona. If you fail to complete MySQL correctly, you can break down the tables and databases so much that you can only recover from a full backup, of course, if you regularly do them. And believe me, it always happens at the most unexpected moment. There are tools that in simple situations can help restore performance, but they are not a panacea. In the latest and current versions they actively struggle with this, much better stability and reliability is declared.
PostgreSQLA kind of mastodon, very old and competent DBMS. It is almost like MySQL, only better. But we must be able to cook and adjust. In the opinion of many, a very stable DBMS, it is almost impossible to drop it, tear down tables like in MySQL. And this may be the decisive factor for you when choosing.
PostgreSQL for you if:
- you are a conservative (I understand, I repeat, but it is) and you need a reliable storage;
- you or your specialist can configure and use PostgreSQL;
- you need well-structured data, but with some flexibility in the data schema (JSON / BJSON);
- with the help of third-party libraries, it is easy and convenient to expand into clusters and make a sharding of tables. And it all really works.
And I don’t have much to describe the minuses ... For the sake of justice, I didn’t have much practice, mostly judging from the stories of my friends
- the need for experience with this DBMS to prepare it well. Otherwise, it is better to take MySQL or read on;
- The default authorization system can cause difficulties in using or configuring; not everyone likes it, some even very experienced developers still do not fully understand how it works there.
MongoDBOh, how many copies still breaks - SQL or NoSQL ... But still, in some cases, MongoDB does the job much better than MySQL or PostgreSQL. For example, the real case, which I witnessed - the collection and processing of statistics on hosting (load on the CPU, i / o, memory, etc.) - MySQL could not cope with the word at all. MongoDB coped without any problems. The database reaches 200-300 GB, the data stream reaches 100 MB or more. It is indicative, in my opinion.
MongoDB for you if:
- you do not have a clear, previously described data structure, or you assume that the composition of the data can then change dramatically (of course, this can be done in SQL, but you need to think in other terms, you can change it, but the question is how hard it will be);
- you have planned a rather serious amount of data (tens or even hundreds of GB), you can determine this even at the TOR stage;
- you just want NoSQL, it is fashionable;
- easy to install and try, works fine without any special settings. And if you go deeper, study, then you can configure a lot.
Cons, too.
- No simple transactions. At least in the classical form as they are in MySQL / PostgreSQL. When adding a set of data that depend on each other, there may be certain difficulties that will have to be solved independently at the code level. Well, that is, you can and will not face of course, but ...;
- data connectivity is practically absent. Immediately it comes to mind, they constantly mention JOIN from SQL - this is essentially not the case. Although, to be more precise, you just need to think in other categories;
- you need to rebuild thinking just under NoSQL. otherwise, it will be difficult to accompany - that is, a good programmer (more precisely, a software architect) is required in the future.
RedisMost often, this DBMS is used as a caching layer for working with data from another, slower DBMS. The best substitute for memcached, if it tells you something. Rarely, but still can be used as an independent database for data. At the same time, Redis can do different types of data, including lists, queues, Pub / Sub can, and it is also very easy to work with TTL (key lifetime). It works in memory, it is very fast, it is able to save data on a disk, and with the support of overwriting (loads the disk much less) and load at startup. Almost a fairy tale.
Redis for you if:
- the amount of data is small and very simple scheme that fits into the pattern “key = value”;
- simple implementation of Master - Slave replication. It's really very simple setup, just add instructions to the wizard to the server config, run Redis Server and the data is already replicated. Although, probably, it should be clarified that it is unlikely that you can configure flexible replication (partial);
- need Pub / Sub (queues). In fairness it should be said that there are separate Pub / Sub systems that implement in addition to this pattern and others. Redis realizes it quite elegantly and simply, it is quite possible that you will not need others;
- you need a cache for a slower DBMS, or just want not to think about the speed of the DBMS with an eye on its volume. An example would be a Drupal site, with the main database in MySQL and a cache in Redis. Conducted tests for the sake of interest in the usual ab, the speed of content delivery can increase significantly. On normal Apache + Redis + mod_php, you can achieve comparable performance with Nginx + php-fpm, and if you add Redis to Nginx ...
Cons also have, as without them.
- the amount of data should not exceed the amount of free RAM on your server (in fact, it can, but then they will all go to the swap, slow down the work, in general, it is better to avoid);
- for the sake of performance, there is a rather weak data integrity. That is, it may well happen that the data is added, and after the restart there is none. Turning on AOL (append of file) slightly smoothes the situation, but then loading from disk will be quite lengthy;
- transactions and related data is not that able. To be more precise, there is Pipeline and Multi / Exec, but this is still not quite a transaction in the classical sense;
- still not able to properly cluster and sharding. There is still no normal implementation.
Of course, you can tense up and do something similar to a cluster using a special script, but in my opinion it looks rather crooked and suboptimal.
Alternatives
From personal experience and surfing the Internet, studying various articles, reviews, I came across several different options that can be applied if something did not suit you in the previous versions. So, meet some more interesting projects.
IBM Lotus Domino / NotesIn my opinion, the clearest example of a successful commercial project of the NoSQL DBMS concept. Although I suspect that success is more likely not in his NoSQL, but in the presence of a full-fledged application server with a built-in code editor and interface to the database. The solution is very mature, it exists on the market for quite some time and is supported by the IBM giant - well, that is, very cool. He personally participated in the development of two different electronic document management systems based on it, and also accompanied some critical applications in the bank. By the way, despite the paid distribution policy, few people know, but it can be downloaded for free for study.
CouchdbDo you want to store documents of different structure (well, like MongoDB, type), but you don’t have an adapter for the application or don’t want extra dependencies? CouchDB works via http, has a built-in web server, exchanges in JSON. Very comfortably. By the way, is able to transaction and you can subscribe to changes in data. But it is not exactly.
PouchDBThis is a very interesting project, as if an analogue of CouchDB, but a more mundane, embedded version. Embedded in the application, works as a local database, but can custom replication with CouchDB or PouchDB Server (based on ExpressJS). PouchDB is actually more like an interlayer, where outside you work with a single API, CouchDB-like, but inside there can be completely different DBMS - both SQLite and LevelDB and browser database and MongoDB and even MySQL. It is very useful if you make a distributed application, where data exchange is important, but the connection with the server may be unstable or episodic.
AerospikeIn my opinion a great replacement for Redis in terms of database key / value. Able to transactions, can data integrity, can a large amount of data (in excess of the amount of available RAM). True, there is no Pub / Sub and some special data structures, but it works quickly and well. Perhaps the only drawback is the weak popularity compared to Redis. It is not clear by the way why ...
Apache cassandraDesigned and working as a distributed NoSQL DBMS for big data. Data is stored in the form of a family of columns, which at first can fundamentally change the approach to developing an application. But after breaking thinking it may well turn out that this is exactly what you need. A simple addition of nodes on the fly, high fault tolerance at the exit of one of the nodes. In theory, it can be used on small projects, but it will probably look as if a nail are being driven in with a microscope.
TarantoolA wonderful project from Mail.Ru. Something in between Redis / Aerospike and MongoDB ... Although in my opinion, the developers themselves still find it difficult to draw analogies :). He must be able to, and to be more precise, want to cook. And it's not about tuning, but about the constant adjustment for the development of new versions of Tarantool. You need to want to delve into the internal structure of this project, constantly study changes to its API and documentation, all the time to customize the application code to change. And if you also want to participate in the development process, chat with developers in the chat - then all the more for you.
And now, as if all.
Yes, I mentioned not all DBMS (if you look at
this site , there are 253 names listed there). I hope you did not think that I will list them all?
Perhaps in the description of a particular DBMS, I made an inaccuracy and did not mention something important. This is very possible and if so - write in the comments, I read with interest.
I hope my conclusions will be useful and interesting.
UPD1 : Regarding MySQL failover. You are all right. I do not declare that InnoDB is unstable, I just mentioned that the situation is possible. And if it happens, then it will not be possible to restore working capacity in any way, only to completely restore everything entirely from backup. It is about this speech, and not about the fact that you decided that I consider it unstable. According to a real example, on the servers we have this happened 2 or 3 times in the last year. On different servers. On the server, several hundred databases with different loads, volumes, etc. This does not mean that your MySQL will fall regularly on a virtual machine with one or two sites, this is not at all the point. Something like this.