PostgreSQL Evangelist Memo: criticize MySQL correctly

Hi, Habr! This publication is an attempt to dispel some popular myths and legends about MySQL. I was not mistaken with the hub, since the reason for writing was the publication of varanio PostgreSQL features that are not in MySQL, and vice versa . The publication itself in part of the criticism of MySQL, although imperfect, but quite correct, but the comments to it suggest a sad reflection.

Generally speaking, I was going to write a post about MySQL features that are not implemented or worse in PostgreSQL. But in order not to interfere with many topics in one publication, and considering the rather hard work compared to what I know very well (MySQL) so that I know very poorly (PostgreSQL), I decided to postpone such publication until later To begin to respond immediately to many of the comments from the publication varanio .

Why do I need it? Well, firstly, on the Internet again, someone is wrong . That in itself is a small problem, but unfortunately I have to meet many of these cool stories not only on the Internet, but also at various Russian conferences, including elite conferences, from elite speakers. It is especially strange to hear not quite correct statements about MySQL from representatives of the Postgres Professional company, to whom I have great sympathy and wish her every possible success and prosperity. Therefore, in the framework of cultural exchange and to increase the technical level of the discussion about the advantages and disadvantages of PostgreSQL and MySQL, I would like to start not only to sort out typical errors, but also to offer the correct wording, as well as to outline the really serious (in my opinion) problems in MySQL this moment.
')
So, the school of young holivorschika. Lesson one: "Criticize MySQL correctly." Tink

How not to criticize MySQL?

Let's start with the simplest myths: near-technical.

“MySQL is legacy”

I’ll just give you a short and far from complete list of well-known companies where MySQL plays key roles in infrastructure and business: Github, Wikipedia, Google, Facebook, Twitter, LinkedIn, Alibaba, Taobao, Booking.com, AirBnB, Dropbox, Pinterest, GroupOn Yelp.

I know that there are large and well-known PostgreSQL projects. But, first of all, very few people can compare in scale with the specified projects. And secondly, this list is simply useful to remember, so as not to lose touch with reality, when someone gives something profound in the style of "MySQL is not needed."

“MySQL has no community!”

The criteria for assessing the scope and activity of the community can be argued for a long time, but the criteria in http://db-engines.com/en/ranking seem to me quite reasonable .

On the other hand, in Russia, the popularity of PostgreSQL and the activity of the community are indeed somewhat higher than the “average global” trends. I do not quite understand what caused this, but I do not see anything wrong with that.

“MySQL has a community, but it is fragmented!”

In general, I could never understand what it was about. It sounds as if Oracle MySQL users need to completely re-learn in order to work with MariaDB. Or the DBA working with MariaDB knows absolutely nothing about Percona Server. In reality, 99% of the skills, books, articles, utilities, tips received somewhere, etc., can be safely used on any version of MySQL.

"MySQL has many forks and they have confusion"

I heard this strange statement for the first time several years ago at a Russian conference. Since then, I quite often hear this from people who are obviously not well aware of what they are talking about. Including in the post mentioned varanio . I could never get clear answers to two simple questions:

What is the "confusion"?
How many forks of MySQL can you count?

I could not get any clear answers this time, but I didn’t really hope. In the development of free and open source software, "forks" are very often encountered. It is usually considered that it is good, for it is free and open. In addition, MySQL has even fewer "forks" and "brunches" than in many other well-known projects. Smaller than PostgreSQL itself , for that matter.

By the way, this argument is strikingly similar to FUD from Microsoft. This is how Microsoft advertised its advantage over Linux at the beginning of this century:

Microsoft is now releasing its Linux distributions.

Microsoft is now releasing its Linux distributions.

“MySQL is owned by Oracle (and PostgreSQL to anyone)”

This, of course, is true, but what important conclusions can be drawn from this, I do not really understand, and the flight of the thoughts of critics usually stops there.

Purely theoretically, Oracle has the right to minimize the project / close the source / make it paid. How realistic this is - everyone can decide for himself. For me it is absolutely unrealistic - you could be wary of this at the time of purchase of Sun Microsystems by Orakl, but 5 years have passed (Karl!) And during this time MySQL has been developing faster than ever before the purchase. In addition, Oracle is not fools and they realize that the closure of MySQL will hit Oracle the most.

“Subjectively, PostgreSQL has fewer bugs”

Maybe it is, but how would you compare objectively in the absence of a bug tracker in PostgreSQL?

On this with non-technical myths we end and move on to more meaningful.

“PostgreSQL is many times faster than MySQL”

I have been doing MySQL code optimization for many years and, as a result, I often carry out load tests, though mostly between different versions and MySQL versions. Here is what I can say about such statements: I am absolutely sure that there are loads / queries / configurations where PostgreSQL will work many times faster than MySQL, just as I am sure of the opposite - there are loads / queries / configurations in which MySQL will be many times faster than PostgreSQL. In the case of MySQL, there is one more important criterion - the used storage engine, which can also change the situation not only several times, but also orders of magnitude.

Every time someone complains about production (no matter MySQL or PostgreSQL), they are asked to show the schema / queries / explain / configuration. And this is not by chance - all talk about performance can only be conducted in such a context. General statements are not quoted.

“Replication in PostgreSQL is done by the mind. And in MySQL there is no "

In MySQL, logical replication is implemented (regardless of the statement-based or row-based formats). In PostgreSQL, replication is physical (not “binary”, but physical).

Each approach has its pros and cons. You don’t even need to be a specialist in a particular DBMS to understand this. MySQL users have gained a lot of experience in circumventing or smoothing the negative effects of logical replication, but most likely sometime physical will appear, because they both need to be both. I suspect a similar process is going on in PostgreSQL, but in the opposite direction.

This topic is very extensive and raises so many questions that I am thinking about a separate post.

“MySQL doesn’t loosely work with data”

Options: "MySQL quietly divides by zero!"

Technically correct, but boring version of this statement is: "In MySQL versions <5.7, you need to remember to enable the correct SQL modes "

In 5.7, the “strictness” settings are pretty reasonable by default. The reason why it was not done for so long is trivial: as for any popular project, the main competitor of MySQL is its own old versions. We have to pull backward compatibility with old crookedly written applications. At 5.7 they decided to do away with it, which is good.

“MySQL has non-transactional MyISAM engines”

Yes, but it’s not very clear what the criticism is. Talking about MyISAM in 2015 is possible only in terms of supporting legacy applications. Honestly, the last time a live user MyISAM I met five years ago.

Although there are non-transactional engines, from which no transaction is required. For example CSV . To make it easier for the PostgreSQL user to understand, the CSV engine is about file_fdw . It is more likely to exchange data with other applications, and not to actually store data.

"In MySQL, transactions are somehow screwed to the side "

By this everyone understands something of their own. Someone thinks so, because you can't wrap the DDL in a transaction. Which of course is true, but the problem grows not from “transactions from the side,” but from the DDL curves (more on this later in the “How to criticize MySQL” section).

A boring, technically correct version: "In MySQL, non-transactional DDL."

Someone says that an error inside a transaction does not automatically roll it back. Yes, errors are different (for example, lock wait timeout), and for them the application has the ability to repeat the last statement, rather than roll back the entire transaction. As far as I know, Oracle and SQL Server behave the same way by default - this is not a violation of any standards. In PostgreSQL, in order to be able to repeat the last statement in case of an error, you would have to wrap each statement inside a transaction in SAVEPOINT. Which, for example, led to the appearance of ON_ERROR_ROLLBACK for psql. Those. As usual, the approaches are different, with their own pros and cons.

"MySQL has very expensive DDL"

This refers to the re-creation of the table under certain operations, for example, when removing constraint. Already in 5.6 you can do ALTER without re-creating the table for almost all operations . In particular, constraint can be removed without recreating. In 5.7, the list of in-place operations is even wider.

I don’t know how to do this in PostgreSQL (a quick search shows that there are some problems). But for MySQL, there is a pt-online-schema-change utility that allows you to bypass many of the limitations of ALTER TABLE in MySQL and which are often not known or forgotten.

"Some mysql MVCC in MySQL"

The MVCC mechanism in InnoDB is well implemented. From the user's point of view, it is almost identical to the implementation of the MVCC in Oracle, and is very similar to the implementation in PostgreSQL. But the differences in the behavior of many are surprising, which gives rise to similar myths. The fact is that the standard defines very vaguely many subtle points in different levels of isolation. As a result, each DBMS interprets these “gaps” in its own way. A good MVCC comparison in Oracle, PostgreSQL and MySQL / InnoDB can be read here . But even there, not all the nuances are reflected.

Over the past 10 years, an enormous amount of effort has been spent on the optimization and scalability of MVCC in InnoDB. I also added to this.

Updated 04/02/2017: a more formal attempt to understand the differences in the implementation of the MVCC in different DBMS, including PostgreSQL and MySQL.

"Because of the connected engines, MySQL writes data to disk 2, 3, 4, and in wartime and 5 times"

Fantastically naive statement. The logic is simple:

write to data files
write to transaction log
write to binary log
there is also some kind of doublewrite buffer, which, judging by the name, writes two more times!

Here we need to talk about the "write amplification" feature. I could not find a good term in Russian, but in fact it is the ratio of the total amount of data recorded on the disc to the total amount of data transmitted by the client. Let's see what it is made of.

First, writing to data files occurs only when the updated pages are flushed to disk. How long a page can be updated in memory before it is flushed to disk (and accordingly, how much write amplification changes) depends on a large number of parameters: the size of the common buffer (buffer pool in InnoDB), the size of the transaction log, the algorithm controlling the page dump , server settings and of course the type of load. This all applies to both MySQL and PostgreSQL.

Secondly, the transaction log contains not only updates of the records in the table, but also all the physical changes in the data files. Using the example of InnoDB, this includes managing index trees (splitting pages when filling, merging when deleting, rebuilding the tree, etc.), deleting old versions of records (purge operation), change buffer operations and other internal accounting. This also applies to both MySQL and PostgreSQL.

Thirdly, the binary log contains only logical changes in the data: it does not “know” anything about the format of the data files and all their internal accounting. Write amplification for a binary log depends on many parameters (statement / row-based, binlog_row_image, and other settings). In addition, the binary log can be disabled - it is often needed, but not always.

Well, finally doublewrite buffer is also activated only when the page is flushed to disk, and not at every INSERT / DELETE / UPDATE. An absolutely similar mechanism, only in a profile, is called “full page write” in PostgreSQL. And there and there it can be turned off under certain conditions.

It is also necessary to take into account the redundancy of data formats (ie, the overhead of service information), the page size (in InnoDB, you can specify it when creating the database), compression, and much more.

I hope from all this it is clear that it is impossible to calculate the characteristic of write amplification - there are too many parameters. It can be measured for a specific load, a specific configuration, and (in the case of MySQL) a specific engine. For example, in the case of TokuDB or MyRocks engines optimized for recording, this characteristic will be much lower than in InnoDB, because they were created for this purpose.

All that can be said is that the binary log (when enabled) results in additional write overhead. How large the costs, and whether the total write amplification will be more than the same performance in PostgreSQL, cannot be said even approximately without measuring specific loads and configurations.

Such is the boring truth.

“There are too many logs in MySQL”

I will not paint all types of magazines, their purpose is more or less clear to everyone. Questions mostly causes binary log. Binary log is a “fee” for the opportunity to have plug-in engines. “Connected” is not in the sense that they can connect dynamically as plugins (there was no such opportunity before), but in the sense that there may be several of them at all, and the server works with different engines through the API.

Since the physical representation of data on disk may vary depending on the engines used, and a single serialized representation of all changes is necessary for replication, such a representation is written in a logical form abstracted from physical data. As I already wrote, the logical log does not necessarily duplicate records in the physical logs when updating data.

With a logical log, as with logical replication, you can do all sorts of interesting things. In particular, it is actively used in technology Galera. I am going to make a brief overview of this whole farm in the next post.

"In MySQL, a bad console client"

As I understand it, this means the absence of context-dependent auto-completion. I agree, it can be convenient. But in order to feel it, you need to spend a lot of time in the console client. As a developer, I spend more time in the editor. It seems to me that most of the work of DBA is also performed by scripts, and not manually in the console client.

But for those who really need it, the mycli project with clever auto-completion and even syntax highlighting has recently appeared. And for lovers of beautiful GUI, there is MySQL Workbench, where, in my opinion, there is everything . Never used one or the other.

Updated 10/05/2017: The beta version of MySQL Shell , a new command line client from Oracle, has added support for autocompletion for SQL.

In any case, it would be more correct to say “there is no contextual autocompletion in the console client of MySQL”, because all the criticism seems to be reduced to this.

How was it possible to criticize MySQL, but already irrelevant?

Separately, I would like to highlight statements that are generally true, but are no longer relevant in the light of the soon release of MySQL 5.7.

“MySQL has no JSON support”

Limited support was previously available in JSON UDF from svetasmirnova . In 5.7 there is native support. As far as it is comparable with PostgreSQL, I can not say, but in any case, the statement in this form is already outdated.

"There are no functional indexes in MySQL"

In 5.7, functional indexes are implemented as indexed virtual / generated columns. In MariaDB, they have been around for a long time, but indexing is only possible for materialized virtual columns.

How should I criticize MySQL?

We turn now to the most interesting part. There are real and serious problems, but they are very rarely spoken about in holivor. In fact, judging by the blogs and presentations at conferences, MySQL users are more interested in not the lack of, say, window functions, but issues of horizontal scaling, sharding, clustering, high availability, cloud platforms and related issues of automation, monitoring, information protection and other things. As a result, it is in these areas that the efforts of the developers are concentrated.

But, since we are talking here about criticism from the PostgreSQL community, but as a rule it doesn’t touch on all of these issues, here I will list the fair, in my opinion, criticisms heard or read earlier, but formulated technically correctly.

No transaction data dictionary

In MySQL, a “data dictionary” is a collection of non-transactional files (.frm, .par, etc.) This is a heavy legacy from the earliest times and it creates a huge number of problems.

This is not only a lack of transactional DDL, for which a data dictionary is a prerequisite. These include problems with expensive queries to INFORMATION_SCHEMA, problems with the extensibility of the metadata format, problems with physical backups that are essentially forced to block the server in order to ensure the consistency of transactional data with a non-transactional dictionary, and others.

Work on this all started , but the results in 5.7, we definitely will not see.

For Oracle, nothing but InnoDB exists

Unfortunately, Oracle is not particularly worried about the fate of third-party engines, for obvious reasons. I do not think that the concept of "connected" engines will be minimized in the near future, but they are clearly not going to take care of the support of other engines.

As a result, many functionalities are implemented only in InnoDB. Examples: foreign keys, full text search, spatial indexes, virtual columns. It is unlikely that we will see all these functions soon in TokuDB or MyRocks. And if they are implemented, they will not be compatible in functionality with the fact that in InnoDB.

Imperfect query optimizer

The optimizer has always been one of the weak points of MySQL. Although I suspect that if you start detailed quizzes, most critics will begin to talk about the problems that were fixed 5-10 years ago. There are useful improvements in each new version, but still progress is slower than many would like. There are historical reasons for this, although the point here is not so much in the concept of plug-in engines, as in the very imperfect optimizer architecture. It requires serious revision and refactoring, and work in Oracle in this direction is underway, but such things cannot be done quickly.

Poor support for advanced SQL

This problem is partly related to the previous one. But yes, there are no window functions, CTEs and many more. How important this is for MySQL users is an open question. As I already wrote, on blogs and conference reports you can’t say that some SQL functionality is really not enough. Judging by the changes in the MySQL releases, for Oracle this is also not a priority area.

Updated 10/05/2017: Support for CTE and window functions appeared in MySQL 8.0. Interestingly, according to the results of testing the well-known PostgreSQL evangelist Marcus Winand, MySQL currently has the most complete CTE support among all popular DBMS, including PostgreSQL.

Weak GIS support

Also true criticism, but in my opinion at Oracle, they take it seriously. In 5.7, GIS was rewritten from scratch, spatial indexes were added to InnoDB, Boost.Geometry was used instead of self-written code, and work began on standards compliance. I suspect that in the next versions there will be a lot of interesting things.

Updated 10/05/2017: GIS support has been greatly expanded in MySQL 8.0. The key points are non-Cartesian SRS support and an extended set of functions for processing spatial data. Details can be found in this presentation . It would also be interesting to compare this with PostGIS. Judging by my communication with people versed in this topic, at some points PostGIS is already inferior in terms of the capabilities of the GIS native support in MySQL 8.0.

Conclusion

Surely I missed something in each section, but I am sure that they will add in the comments. It is impossible to cover everything with one publication, but if this publication makes someone thoughtfully and skeptically about criticism of MySQL at Habré and conferences, then I pressed the buttons for a reason. It would be interesting to write separately about logical / physical replication and MySQL functions that are not implemented in PostgreSQL or implemented worse, and I’ll probably take a chance if, of course, I don’t mind completely.

Source: https://habr.com/ru/post/268949/

All Articles