"The most serious opportunity I, of course, consider a multimaster," - Ivan Frolkov on the development of Postgres Pro EE

Dear colleagues, we are glad to offer you the second issue of our new column “interviews with database developers”. We talked with Ivan Frolkov , the developer of Postgres Professional. Ivan has been involved in application development for databases for over 20 years. Today, Ivan will secure a veil of secrecy and tell about new interesting possibilities of the “domestic Posgres”, Postgres Pro: EE.

PG Day: Tell me a little, please, how long have you been working on databases and, generally, in a profession, in what roles, and so on.

IV : I started to deal with databases, it was 93 years old and we did such a terrible thing as the register of shareholders. We had then, if anyone remembers, voucher privatization. We wrote it on a clipper. Then, in general, there were no special options: Fox Pro, Clipper and, in my opinion, everything.
')
For some legal reasons, our company was from several branches, which duplicated each other. I had to exchange information with them. They also had this register of shareholders written on the clipper. There were a lot of private individuals, voucher privatization, everything. What particularly shocked me, the first few screens consisted simply of binary garbage, as if the data files were beating there, there was all sorts of rubbish. After rubbish there were just empty fields, and then names, surnames and patronymic began. I then asked myself: how much can we trust what seems normal to us, having just such garbage at the very beginning? This was the first I had a sound judgment - was the year 94th. Then I went to work at a company that served the Ministry of Finance of the Russian Federation. I was responsible for maintaining the register of auditors of the Russian Federation, then there were registries, certificates, licenses to perform audit operations. We had SQL Server 6.0, 6.5, 7.0 and Delphi.

Then I poflanilstvuyu somewhat. Then I ended up in Mail.Ru. I did postcards, newsletters, answers. After Mail.Ru engaged in startups. Everything was connected to databases in one form or another. In the end, I was also in a kind of startup, but professional. I’m doing PostgresSQL databases, mostly query optimization and writing various documents, which, frankly speaking, made me very tired.

PG Day: you say that you are engaged in query optimization. What do you mean by that? What is Postgres Pro responsible for there? As far as I know, certain patches, versions ideologically invented by you .

IV : Ideologically, I walked for a long time and screamed that Postgres had very bad partitioning. Finally, sectioning was done, and I, naturally, drove it from earlier versions. Now it shows itself very well. Only two very critical things remain: there are no global indices and there are no separate partitioned indexes. And it does not work well with the number of sections over 20-30 thousand. The bases are now large, and I would like to have hundreds of thousands, up to a million sections for very large volumes. The guys are trying, I go and Noah, they write. I hope to somehow influence.

The second thing I influenced was the alienable tables. Perhaps, I invented them, and realized Nastya Lubennikova, thank her very much for that. The idea is that on one server, you can prepare a table with data, “provacuum” it, build indexes, statistics, disconnect and connect to another server already ready.

I am directly involved in integration with applications. The only thing I can say for .net I do not write, fortunately or unfortunately. I come across Java , PHP . And I still fall down "Here we have a request, it would be necessary to somehow overclock it." I overclock it. I was also given improvised guys. Guys smart, I like. Here we are overclocking.

PG Day: At the upcoming PG Day you are going to read a very detailed master class about the Enterprise version of Postgres Pro that you are developing. How did it happen that it is you who are reading this master class, which is dedicated to the functionality of this version? And my second question: is it really so full of new features that it is worthy of a master-class for hours?

IV : I have always considered myself an application programmer. Not the one who writes the DBMS, but the one who uses it. I think that the master class will be focused primarily on people who write the code directly. Not the code that is in the DBMS, but when Marya Ivanna comes in and says “I clicked something and everything disappeared”.

How interesting is the version to the developer? Yes, there are very interesting opportunities. I already mentioned alienable tables. This task arose from the requirements of a single client, who periodically dumped data at such a speed that he simply did not have time to insert them into the database. Hence the requirement of a large number of sections in the table, for very large amounts of data.

Also, from my point of view, we made a very small and very pleasant pg_variables module. These are session variables. They live during the session and locally for the session. In general, on the one hand, there is nothing supernatural. On the other hand, for some reason this has not yet been done. This, in particular, is also critical for users who are migrating from Oracle, because there are batch variables there. In Postgres, this is not so good.

The third thing: we are actively developing multimaster . This is really a true multimaster. Generally speaking, I considered quite a lot of any money, not specifically the banking “operden” the notorious, but billing and other things. The owner or management is eager to know exactly what is happening with money and how. Therefore, I deal with issues of transactional integrity and consistent data representation with great attention. In our master it is really provided. He is really a master-master. Everything works fine. There is no quackery there. To resolve conflicts, you have to roll back a transaction that could not be executed.

Now he pulled up nicely, right before his eyes is improving. This was shown by the story with one of our clients, he has a rather catchy logic in hibernate. He cannot say about some transactions whether they are reading or writing. His task worked fine, and it worked on the multimaster 2.5 times faster than on the bare vanilla Postgres.

PG Day: Yes, it sounds very impressive. You mentioned the problem of resolving conflicts between nodes, that the transaction is rolled back, and so on. It always seemed to me that the operation of the multimaster implies that there will be problems and they need to be somehow solved: to intervene manually in order to restore normal functioning .

IV : In this case, when performing multimaster operations, synchronous replication actually occurs and then an honest, perfectly normal two-phase commit . Yes, it is heavier than the usual commit, no one argues. But, on the other hand, in the case of a large number of readings, this gives a gain, because there we get at least three “nodes” in the multimaster and, accordingly, the reading performance increases three times . Yes, I should note that it is very relevant for such cases, the JDBC driver allows you to list different servers in the connection string. That is, in case of failure, for example, it can automatically reconnect.

He may be behind the net performance of vanilla, still need to synchronize with other servers. Naturally, this imposes some restrictions. But, on the other hand, we get it really working, disaster-proof solution.

Another possibility is the compression of tables , which are focused mainly on reading. True, there is another kind of vacuum, called the garbage collector. For example, most of the huge tables that I came across are, in fact, append-only tables. State structures, for example. They have an event, they register them. There are many events, but the event has occurred, and it is unchanged. And in this case, data compression basically gives a noticeable gain on I / O. Here even we are not talking about the size of the disk, because the disks are now large, but then we are winning very much on reading, on input-output.

PG Day: Do you have any feature of the enterprise version, which is your personal favorite?

IV : Oddly enough, I like the most from the point of view that I have to sit down and write code for something, the simple pg_variables. It is no claims, but convenient. The most serious opportunity I, of course, consider a multimaster.

PG Day: As far as I remember, in terms of the master class that you stated, there was an item about nested transactions and autonomous transactions. Briefly tell what you have planned on this issue?

IV : The first possibility using autonomous transactions, which is usually required, is logging . The user performs such and such an operation, all of a sudden he rollback occurs, all data is lost. Maybe he wants to do something bad and it would be nice to take this into account. With autonomous transactions, we can register this operation within individual transactions, completely independent of the main operation. There is still such a subtle point when nothing can be done without this autonomous transaction or without an external application. For example, I ran into this when it was necessary to send bitcoins.

What is the situation in bitcoins? There is a global ledger book, a blockchain is called, transactions are written into it: from such an account such an amount of money. Absolutely standard. Everything is a little trickier, but for simplicity, let's assume that this is so. The problem is that the transaction becomes true when it is confirmed by other participants in the exchange. That is, when we send a transaction, we do not know whether it has passed or not. We sent money, and it is possible that they will need to be re-sent. On this may run once a month, suppose. We perform the operation of sending money, and then the transaction took and rolled back. Yes, we are honest people, we will not do double-spending, but it’s somehow strange to be laid in the code for an honest word, so we cannot roll it back, nor check whether it has passed or not.

The two-phase commit does not work out for obvious reasons, and if we could check, then we could implement the idempotency property that we sent already, and we will no longer. That is, as a result, we don’t know the result of the execution. We can send only once and hope that everything will be fine. In the framework of the DBMS, we can not do it. When a transaction is rolled back, from the point of view of the application programmer, it seems like there was nothing, so, on the one hand, we have to fix the fact of payment attempt, and on the other, everything is lost during rollback.

PG Day: Without third-party help no longer understand.

IV : If we do this from an application program, we can open a second connection and attempt to send to register with a separate transaction. And first we register what we are trying to send, and then send. Inside a DBMS without autonomous transactions, this is unfortunately impossible to do. With autonomous transactions, this is completely realizable. What else? This is the second task that arises - to avoid re-execution of canceled operations.

I will also talk about the interesting possibility of scheduler - the execution of tasks on a schedule. It is available from pl / pgsql, so you can call queries, plus it is still interesting in tasks. You can send a task for execution. What is interesting in our sheduler? Our tasks may depend on one another. That is, such a task will not be completed before others are completed.

PG Day: that is, already some kind of business logic can be programmed in this way.

IV : Yes, if you remember, I had ordered messages in mbus .
That's just the implementation of the same logic. That is, we can carry out a certain operation in several steps. In case it fails, we must, again, roll it back in a few steps. For him, now they make the user interface, let's see what happens. So far, it is available only in the API.

PG Day: sounds very impressive, a lot of interesting things. We have some participants expressed skepticism that it would not be just a marketing report, where you will tell that all these features are very cool. Do you plan any practice, examples, so that participants can make sure that this is really cool? Can we expect a demonstration with illustrative examples?

IV : I am planning. I do not plan to distribute the Postgres Pro EE version, but I will show everything on my computer. On the topic of what cool features, I will naturally tell. Why then tell at all if it were bad?

The master class will be interesting especially for people and organizations that are really thinking about using our product. I hope that this seminar will be useful for exactly in order to dispel any fears, to show new opportunities, and maybe even to demonstrate problems that are not yet recognized.

If I wanted to do marketing, I would first of all say that we have a FSTEC license, and for one customer we made a project with a secrecy of “state secret”. Yes, not everyone needs it - even, perhaps, very few people, but who needs it - those really need it. In addition, we can process personal data at the highest level (“Group 1 - special categories of personal data, which include information about the nationality and race of the subject, religious, philosophical or political beliefs, information about the health and intimate life of the subject”). For medicine, this limitation may well be critical. If I were engaged in marketing, I would tell this. Because for a number of applications, these are simply killer requirements. But I will not.

PG Day: What would you advise to beginners DBA, what technology to look at, whose articles and blogs to read?

IF :
1. Carefully read the book Architecture of a Database System.
2. No less attentively to study the documentation for the used DBMS.
3. To study the basics of accounting. First, it is just purely practical, and secondly, it will be clear where the requirements for the DBMS came from ...
4. Keep track of all major DBMS, the benefit of their little.
5. I highly recommend that you familiarize yourself with them - first of all with Oracle and DB2, especially since this will be enough for a long time - things are very serious.

I personally read the Postgres mailing lists, but I don’t think this is a must-read for everyone.

PGDay: And in conclusion, in terms of use for work and in personal life, what operating systems, applications and technology do you use?

IV : Nothing unexpected - at home I have an old laptop with Windows 10, otherwise my wife and daughter will not understand, and the little traveling one with ubuntu is a plus, home for there because of ubuntu there, in principle, will not go, everything is mine. At work - the usual Debian. From applications - the most important application for a programmer is Word (or Libre Office), as is well known; and so - vi, psql, pgAdmin3 and Netbeans - I got used to it.

Development of a new fork is a large-scale task. Unfortunately, we could not fit all the details about domestic development in the framework of a 30-minute conversation. Friends, be sure to ask your questions, Ivan and his colleagues will be very happy. Well, we are in a hurry to invite everyone in to PG Day'17 Russia . Ivan will hold a workshop Postgres Pro Enterprise for developers , in which he will tell in detail about the fork options that have already been mentioned in the interview, and many others. You will also find many interesting reports from Ivan’s colleagues: Oleg Bartunov, Alexander Korotkov, Ivan Panchenko, and other Postgres Professional specialists.

See you at PG Day'17 Russia !

Source: https://habr.com/ru/post/328442/

All Articles

"The most serious opportunity I, of course, consider a multimaster," - Ivan Frolkov on the development of Postgres Pro EE

More articles: