PostgreSQL vs MySQL

In anticipation of my report at the PGCONF.RUSSIA 2015 conference, I will share some observations on the important differences between MySQL and PostgreSQL. This material will be useful to all those who are not satisfied with the features and features of MySQL, as well as those who take the first steps in Postgres. Of course, you should not consider this post as an exhaustive list of differences, but for making a decision in favor of a particular DBMS, it will be quite enough.

Replication

The topic of my report is “Asynchronous replication without censorship, or why PostgreSQL will conquer the world”, and replication is one of the most painful topics for busy projects using MySQL. There are many problems - correctness of work, stability of work, productivity - and at first glance they look unrelated. If we look in the historical context, we get an interesting conclusion: MySQL replication has so many problems because it was not thought out, and the point of no return was support for the storage engine (plug-in engines) without answering the questions “what about the magazine?” And “ how different storage engines participate in replication. ” In 2004, the user tried to “find” the storage engine in the PostgreSQL source code in the PostgreSQL mailing list and was very surprised that there were none. During the discussion, someone suggested adding this feature to PostgreSQL, and one of the developers responded, "Guys, if we do that, we will have problems with replication and with transactions between engines."

WAL and PITR. Locking and replication / load management too. An interface should be
abstracted.

link to this email in postgresql mailing list
')
More than 10 years have passed, and what do we see? MySQL has annoying problems with transactions between tables of different storage engines and MySQL has problems with replication. Over these ten years, PostgreSQL has added pluggable data types and indexes, and there is also replication — that is, MySQL's advantage has been leveled out, while the architectural problems of MySQL have remained and are making it difficult to live. MySQL 5.7 tried to solve the problem of replication performance by parallelizing it. Since the project at work is very sensitive to replication performance because of its scale, I tried to test whether it became better. I found that parallel replication in 5.7 runs slower than single-threaded in 5.5, and only in some cases, about the same. If you are currently using MySQL 5.5 and want to upgrade to a more recent version, then keep in mind that for high-loaded projects, migration is not possible, since replication will simply stop running.

After the highload report, Oracle took note of the test I developed and said that they would try to fix the problem; recently they even wrote to me that they were able to see concurrency on their tests, and sent the settings. If I am not mistaken, at 16 threads there was a slight acceleration compared to the single-threaded version. Unfortunately, I have not yet repeated my tests on the settings provided - in particular, because with such results our problems still remain relevant.

The exact reasons for this performance regression are unknown. There were several assumptions - for example, Christian Nelsen, one of the developers of MariaDB, wrote in his blog that there might be problems with the performance scheme, with thread synchronization. Because of this, there is a regression of 40%, which is visible on regular tests. Oracle developers refute this, and they even convinced me that it is not there, apparently, I see some other problem (and how many of them are there?).

In MySQL replication, problems with the storage engine are exacerbated by the chosen level of replication - they are logical, while in PostgreSQL - physical. In principle, logical replication has its advantages, it allows you to make more interesting things, I will also mention this in the report. But PostgreSQL, even as part of its physical replication, has already negated all these advantages. In other words, almost everything in MySQL can already be done in PostgreSQL (or it will be possible in the near future).

One cannot hope for the implementation of low-level physical replication in MySQL. The problem is that there instead of one journal (as in PostgreSQL) there are two or four, depending on how you count it. PostgreSQL just commits requests, they are logged, and this log is used in replication. PostgreSQL replication is super stable because it uses the same log as in disaster recovery operations. This mechanism has long been written, well tested and optimized.

In MySQL, the situation is different. We have a separate InnoDB log and replication log, and we need to commit both there and there. And this is a two-phase commit between logs, which by definition is slow. That is, we cannot simply say that we are repeating a transaction from the InnoDB log — we have to figure out what kind of request it is, start it again. Even if this is a logical replication, at the line level, these lines should be searched in the index. And not only that you have to do a lot of work to fulfill the request - it will again be written to its InnoDB journal on a replica, which is clearly not good for performance.

In PostgreSQL, in this sense, the architecture is an order of magnitude more thoughtful and better implemented. Recently, it announced the possibility called Logical Decoding - which allows you to do all sorts of interesting things that are very hard to do in a physical journal. In PostgreSQL, this is an add-on on top, logical decoding allows you to work with the physical log as if it were logical. It is this functionality that will soon remove all the advantages of MySQL replication, except, perhaps, the size of the journal — statement-based replication MySQL will win — but statement-based replication of MySQL has completely wild problems in the most unexpected places, and you should not consider it a good solution ( I will say all this in the report too).

In addition, for PostgreSQL there is trigger replication - this is Tungsten, which allows you to do the same. The trigger replication works in the following way: triggers are set, they populate tables or write files, the result is sent to the replica and applied there. It is through Tungsten, as far as I know, that they make migration from MySQL to PostgreSQL and vice versa. In MySQL, logical replication works right at the engine level, and the other cannot be done now.

Documentation

PostgreSQL has much better documentation. In MySQL, it even seems to be formal, but the meaning of individual options is hard to understand. It seems to be written what they are doing, but in order to understand how to properly configure them, you need to use informal documentation, look for articles on these topics. Often you need to understand the architecture of MySQL, without this understanding of the settings look like some kind of magic.

For example, this is how Percona “fired”: they kept the MySQL Performance Blog, and there were a lot of articles in this blog that dealt with specific aspects of MySQL operation. This brought enormous popularity, led clients to consulting, allowed to attract resources to start developing their own fork of Percona-Server. The existence and demand for the MySQL Performance Blog proves that official documentation is simply not enough.

PostgreSQL has virtually all the answers in the documentation. On the other hand, I have heard a lot of criticism when comparing PostgreSQL documentation with an “adult” Oracle. But this is, in fact, a very important indicator. Nobody tries to compare MySQL with an adult Oracle at all - it would be ridiculous and ridiculous - and PostgreSQL is already starting to compare quite seriously, the PostgreSQL-community hears this criticism and is working on improving the product. This suggests that it in its capabilities and performance begins to compete with such a powerful system as Oracle, on which mobile operators and banks work, while MySQL remains to sit in the niche of websites. And the gigantic projects, which have grown to a large amount of data and users, take grief with MySQL with a big spoon, constantly resting on its limitations and architectural problems that cannot be fixed by spending a reasonable amount of time and effort.

An example of such large projects on PostgreSQL is 1C: PostgreSQL comes as an option instead of Microsoft SQL, and Microsoft SQL is really a fantastic DBMS, one of the most powerful. PostgreSQL can replace MS SQL, and trying to replace it with MySQL ... let's lower the veil of pity over this scene, as Mark Twain wrote.

Standards

PostgreSQL complies with SQL-92, SQL-98, SQL-2003 standards (all its reasonable parts are implemented) and is already working on SQL-2011. It is very cool. For comparison, MySQL does not even support SQL-92. Someone will say that in MySQL this goal was simply not set by the developers. But you need to understand that the difference between the versions of the standard is not small changes - this is new functionality. That is, at the moment when MySQL said: “We will not follow the standard”, they didn’t just make any minor differences, due to which MySQL is hard to maintain, they still blocked the way for the realization of many necessary and important features. There is still no normal optimizer. What is called optimization there is called a “parser” plus normalization in PostgreSQL. In MySQL, this is just a query execution plan, without separation. And MySQL to support standards will come very soon, because they are under pressure from backward compatibility. Yes, they want to, but in five years, maybe something will appear. PostgreSQL has it all now.

Productivity and administrative complexity

In terms of ease of administration, the comparison is not in favor of PostgreSQL. MySQL is much easier to administer. And not because in this sense he is better thought out, but simply knows how much less to do. Accordingly, it is easier to set it up.

MySQL has a problem with complex queries. For example, MySQL does not know how to lower the grouping into separate parts of union all. The difference between the two queries — in our example, grouping by separate tables and union all from above worked 15 times faster than union all and then grouping, although the optimizer should bring both queries into the same, efficient query execution plan. We will have to do the generation of such requests by hand - that is, spend the developers time on what the base should do.

The “simplicity” of MySQL arises, as can be seen above, from extremely poor features — MySQL is simply worse and requires more time and effort during development. In contrast, PostrgreSQL has histograms and a normal optimizer, and it will execute such queries efficiently. But if there are histograms, then there are their settings - at least the bucket size. You need to know about the settings and in some cases change them - therefore, you need to understand what the setting is, what it is responsible for, to be able to recognize such situations, to see how to choose the optimal parameters.

Occasionally, the skill of PostrgreSQL can interfere, not help. In 95% of cases, everything works well — better than MySQL — and one stupid query is much slower. Or everything works well, and then suddenly (from the user's point of view) as the project grew, some requests began to work poorly (more data became available, another query execution plan was selected). Most likely, to fix it, just run analyze or twist the settings a little. But you need to know what to do and how to do it. At a minimum, you need to read the PostgreSQL documentation on this topic, but for some reason they don’t like to read the documentation. Maybe because there is little help from MySQL? :)

I emphasize that PostgreSQL is not worse in this sense, it just allows you to postpone problems, and MySQL immediately throws them out and you have to spend time and money on solving them. In this sense, MySQL always works stably badly, and even at the development stage, people take these features into account: they do everything in the simplest possible way. This refers only to performance, more precisely, to ways to achieve it and to its predictability. In terms of correctness and convenience, PostgreSQL is a cut above MySQL.

So what to choose?

To make a choice between MySQL and PostgreSQL for a specific project, first of all you need to answer other questions.

First, what kind of experience does the team have? If the whole team has 10 years of experience with MySQL and you need to start as quickly as possible, then it’s not a fact that it’s worth changing a familiar tool to an unfamiliar one. But if the timing is not critical, then it is worth trying PostgreSQL.

Secondly, we must not forget about the problems of exploitation. If your project is not heavily loaded, then in terms of performance there is no difference between these two DBMSs. But PostgreSQL has another important advantage: it is more stringent, makes more checks for you, gives you less opportunity to make a mistake, and this is a huge advantage in the future. For example, in MySQL you have to write your own tools to verify the usual referential integrity of the database. And even with this there can be problems. In this sense, PostgreSQL tool is more powerful, more flexible, it is more pleasant to develop on it. But it largely depends on the developer’s experience.

To summarize: if you have a simple online store, you do not have money for the administrator, there are no serious ambitions to develop into a big project and you have experience with MySQL, then take MySQL. If you assume that the project will be popular, if it is large, it will be difficult to rewrite it, if it has complex logic and connections between the tables - take PostgreSQL. Even out of the box, it will work for you, help in development, save time, and it will be easier for you to grow.

Source: https://habr.com/ru/post/248845/

All Articles