📜 ⬆️ ⬇️

Bruce Momjan, Marco Kreen, Fedor Sigaev and Maxim Boguk answered questions about PostgreSQL

Photos from tika-online.de
On June 30, we held an online session, Questions to the world's leading PostgreSQL experts. Within three hours, anyone could ask a question to one of the invited guests and get a comprehensive answer. As a result, more than 40 questions were received, none of which went unheeded.

Participants: Bruce Momjan (Bruce Momjian) (PGDG, EnterpriseDB), Maxim Boguk (Rambler, Masterhost), Fedor Sigaev (PGDG), Marko Kreen (Marko Kreen) (Skype).

Here are the selected questions and answers:
')
Question number 3

kai kai :
One of the annoying pieces in the work is the lack of the INSERT OR UPDATE construction. In MySQL, you can do this: “INSERT INTO users (username, email) VALUES ('Jo', 'jo@email.com') ON DUPLICATE KEY UPDATE email = 'jo@email.com'”. Will anything similar appear in PostgreSQL?

Bruce Momjan, PostgreSQL community leader, EnterpriseDB expert:
The standard (ANSI) method for such things is MERGE or UPSERT. Both methods are in our TODO list and, as far as I know, someone is working on this functionality for PostgreSQL 8.4, although I personally have not seen this patch yet. We, no doubt, need this syntax.

Question number 5

Alexander:
Questions to Skype developers:
1) I want to estimate the scale of Skype, at least approximately. How many users are currently registered? How many users are online at the same time? How many transactions per second does your PostgreSQL perform? How many PostgreSQL servers do you have?
2) How to measure the number of transactions per second in PostgreSQL?

Marco Kreen, Skype engineer:
1) The total number of users we now have around 300 million. I don’t know for sure about active users, but my Skype client shows that there are 10 million users online now, and this is far from a peak load. I do not have exact figures for the number of transactions per second (TPS), but it seems that our workload is somewhere around 10-20 thousand TPS. In total, we have over 100 PostgreSQL servers, although not all of them work simultaneously.
2) Use the log analyzer or pgBouncer installed in front of the database, it gives such statistics.

Question number 7

Sergey Konoplev:
We have the task of generating events based on operations in the database: add / modify / delete. This, of course, can be done in triggers, but at the level of WAL files IMHO it would be more efficient, not to mention that the processing could be done deferred and rendered. The question is - is there a well-documented API for working with WAL files and, if so, how much will it change from version to version? If so, where can I find information about it?

Fedor Sigaev, a leading PostgreSQL developer:
There is no external API, since external access to WAL files was never intended. Moreover, their format is compatible only for minor versions, the guarantee of compatibility of formats between major versions was never even expected. In Postgres, there is not even a central decrypter for WAL records, each record is in some sense a container that has a decryption attribute (table, btree index, GiST index, etc.). It is not very convenient to build your processing on WAL logs. the following reason: files can be reused, and Postgres does not inform the outside world about it.

Bruce Momjan, PostgreSQL community leader, EnterpriseDB expert:
That is, are you looking for a way to read WAL files directly? There is no API for this, as their internal format is rather complicated. I now realized that the possibility of reading the logs could be useful, perhaps we should consider including this wish in the TODO list. We actually have pg_filedump ( http://sources.redhat.com/rhdb/utilities.html ), but I’m not sure if it dumps WAL files.

Marco Kreen, Skype engineer:
Maybe it is more efficient, but less reliable (for example, a WAL file can be deleted without your knowledge). Please try Skytools PgQ and study the scripts that use this mechanism for queuing. This is a very efficient queue within the database, easy to use and reliable. It also has a built-in framework in order to guarantee transactions between multiple databases when processing a queue. There are also triggers that automatically determine the structure of the tables and format the data in an easily parsed format (urlencode).

Question number 8

sakalr:
Question about the books. Today I received a book from the British Amazon, I had to wait a few weeks. In Russia, from current versions I found only a book for 3,900 rubles. In general, let the books be even in English, but I want to come to Biblio-Globus or the Young Guard and see this book on the shelf, but not for 3,900 rubles (on Amazon this book was 2 times cheaper) . There is a need for development on PostgreSQL, but for me, as a beginner in PgSQL, this is difficult. Constantly you have to communicate on freenode in broken English, master the basics, having spent a lot of time when it should be just a mechanical assimilation. That is why the question arises about the book. Books on postgresql (English / Russian) are very necessary.

Fedor Sigaev, a leading PostgreSQL developer:
If you are already familiar with any SQL dialect, then the work in Postgres should not cause you great inconvenience. Postgres documentation is very good and usually suffices (with careful reading, of course). If you are not familiar with SQL, then it is better to take any textbook on SQL.

Bruce Momjan, PostgreSQL community leader, EnterpriseDB expert:
I fully agree that more English and Russian books are needed. Ideally, we need to have PostgreSQL documentation translated into Russian.

Question number 18

kai kai :
Here's another question about pgAdmin - are you planning to regularly build packages for common versions of Linux? For example, in Ubuntu 8.04 pgAdmin 1.8.2 is now (buggy), and pgadmin 1.8.4 is already on the pgadmin.org site.

Bruce Momjan, PostgreSQL community leader, EnterpriseDB expert:
I was told that these problems were fixed in Ubuntu (Ubuntu), which, by the way, I use myself. I think the only problem here is forcing the developers of Ubuntu to update the server for updates. The bundle of software packages with Postgres for Ubuntu also works fine and maybe you should try it. There is a PostgreSQL server, pgAdmin, and many other software packages prepared by Dave Page, the author of pgAdmin, and not the developers of Ubuntu.

Question number 26

How is PostgreSQL migrated to Skype between major versions?

Marco Kreen, Skype engineer:
Using Londiste (a replication component in Skytools).

Question number 36

Dmitriy:
Is it planned to introduce a built-in multi-replication master? If so, when can we expect the beta version?

Bruce Momjan, PostgreSQL community leader, EnterpriseDB expert:
We could try to implement multi-master replication, but locks often make it inefficient. Most developers consider master / slave replication or partitioning of data across servers as the best alternative. PgPool, by the way, can work in multi-master mode. PgCluster implements a multi-master, but its performance leaves much to be desired. Slony-II tried to implement a multi-master, but again, the performance was low.

Fedor Sigaev, a leading PostgreSQL developer:
There are plans for built-in replication of Master-Multiple Slaves, we hope that this will be in 8.4. About Multi-Master - look at Postgres-R, its source codes have been open for a month now.

Marco Kreen, Skype engineer:
If you want to distribute the load on the recording on several machines, see PL / Proxy - it is not a multi-master in the conventional sense of the word, but it has very good performance, which is so lacking in conventional multi-master systems.

The full history of the online session can be found on our website .

Source: https://habr.com/ru/post/30381/


All Articles