Postgres Pro quiz task analysis on PGDay'17

A good tradition at postgresovyh conferences was to arrange a quiz with a raffle tickets to the next conference. Our company Postgres Professional at the recent PgDay'17 played out tickets for PgConf.Russia 2018 , which will be held in February 2018 in Moscow. This article presents the promised analysis of the quiz questions.

The conference participants were asked the following questions:

1. At execution on the basis of read-only requests, a write to the disk was found. Who (what) is to blame?

Answer options: WAL, Hint bits, Vacuum, Russian Hackers, Raccoons.

Raccoons and a vacuum cleaner could also work with the system and write to the disk. The essence of the question, of course, is precisely in the record that occurred as a result of the reading, and not accidentally coincided with it in time.

The correct answer is Hint bits, in Russian documentation these are “auxiliary bits”. Unfortunately, the documentation says little about them, but it is replenished in the Wiki . These bits are in the header of the tuple, and are intended to speed up the calculation of its visibility. They contain information about:

whether a tuple is created by a transaction that is already committed
was the transaction that was interrupted created
whether the tuple is deleted by a transaction that is already committed
whether the tuple is deleted by the transaction that was interrupted.

The values of these bits can also be entered during a read operation, if the value of any of them was determined when determining the visibility.

2. How many records will be added to pg_class with the command:

CREATE TABLE t (id serial primary key, code text unique);

We will begin the search for the correct answer from the fact that not only tables, but also indexes, sequences, views and some other objects are stored in the pg_class system table.

As we create the table, one entry in pg_class will definitely be added. So, one entry in pg_class is.

The next fact we need is that PostgreSQL uses unique indexes to implement the PRIMARY KEY and UNIQUE constraints.

In the table being created there is both a primary key (id) and a unique one (code). So for each of these restrictions will be created by index. Subtotal - 3 entries.

Now look at the serial data type, which is used for the id column. In fact, there is no such type of data, the id column will be created with the integer type, and the serial indication means that you need to create a sequence and specify it as the default value for id. Thus, a sequence will be created, and the number of entries in pg_class increases to 4.

The code column is declared with the text type, and this type can contain very large values that are much larger than the page size (usually 8KB). How to store them? PostgreSQL uses a special technology for storing large values - TOAST. Its essence is that if the table row does not fit on the page, then another special toast table is created in which the values of the “long” columns will be written. For the user, all this internal kitchen is not visible, we work with the main table and we can not even guess how everything is arranged inside. And PostgreSQL in order to quickly “splice” the rows from the two tables also creates an index on the toast table. As a result, the presence of the code column with the text type leads to the fact that two more entries are created in pg_class: for the toast table and for the index on it.

The total and the correct answer: 6 records (the table itself, two unique indexes, a sequence, a toast table and an index on it).

3. There are two tables:

 CREATE TABLE t1(x int, y int); CREATE TABLE t2(x int not null, y int not null);

In both added 1 million records. Which table takes up more disk space and why?
The response options were as follows: First, second, take equally, the first will take less or as much as the second, depends on the version of PostgreSQL.

If non-NULL values are added to the first table, then it will take up as much space as the second one. If NULLs are added to it, it will take up less space.

This is due to the storage features of NULLs (or rather, non-storage: they are not stored, instead of them special bits are inserted in the header of the record, indicating that the value of the corresponding field is NULL). More information about this can be found in the documentation and the report of Nikolai Shaplov. What is inside it .

By the way, if in the table t1 only one of the two fields is NULL, t1 will take the same place as t2. Although NULL does not occupy space, alignment acts, and therefore, in general, this does not affect the volume occupied by records. Alignment is still to be found in Task 4.

A meticulous reader will say: “Well, well, the NULLs themselves are not stored, but somewhere the same bit string t_bits should be stored, where a bit is allocated to each field that can take the value NULL! It is not needed for table t2, but needed for t1. Therefore, t1 can take more space than t2. ”

But the meticulous reader forgot about alignment. The title of the record without t_bits takes exactly 23 bytes. And under t_bits, one byte of the record header will be allocated in t1; in the case of t2, it will be eaten by alignment.

If you have the pageinspect extension installed , you can look at the title of the record, and, coping with the documentation, see the difference:

 SELECT * FROM heap_page_items(get_raw_page('t1', 0)) limit 1; SELECT * FROM heap_page_items(get_raw_page('t2', 0)) limit 1;

4. There is a table:

 CREATE TABLE test(i1 int, b1 bigint, i2 int);

Is it possible to rewrite the definition so that its records take up less disk space and, if so, how? Suggest your option.

Everything is simple, the point is alignment. If you have a 64-bit architecture, then fields in records that are 8 bytes long or more will be aligned by 8 bytes. So the processor can read them faster from memory. Therefore, 4-byte int need to add next, then they will occupy together 8 bytes.

There is no difference on the 32-bit architecture. You can learn about the internal structure of the records from the documentation and the report of Nikolai Shaplov already mentioned.

5. Which type takes up more disk space: timetz or timestamptz?

Unexpected result : timetz takes up more space (12 bytes) than timestamptz (8 bytes),
Why so? This is a historical legacy. And no one is going to get rid of him? See Tom Lane's answer . By the way, if someone really needed timetz (time with time zone) in practice, write to us about it.

6. How can I check the consistency of the database in order to make sure that some of the data in the database is not lost?

The answer is simple: there is no such tool in PostgreSQL yet. Cheksumma is in many places, but not everything is protected. Therefore, we had to consider the answers of the “ask and compare” type to be correct. Postgres Pro is working to improve self integrity monitoring.

7. Which of the conditions below is true and why?

 (10,20)>(20,10) array[20,20]>array[20,10]

Strings are compared from left to right , so the first expression is false.
Comparing arrays is also done , so the second is true.

8. Which subsystems action turns off the setting track_counts = off?

Answer options: Statistics Collector, Checkpointer, WAL archiving, Autovacuum, Bgwriter, None of the listed

The documentation states that track_counts includes the collection of statistics on access to tables and indexes, which is needed, including for auto-vacuum. With the help of this statistic, autovaccum decides which tables to take it. Details can be read in the comments to the source of avtovakuma .

Of course, WAL, Bgwriter and checkpointer archiving are not associated with this parameter.

9. What will be the result of the request

 select NULL IS NULL IS NULL ?

This is probably the easiest question. False answer, because NULL is NULL, and all this is true.

10. What types of indexes command does not work

 CLUSTER [VERBOSE] table_name [ USING index_name ]

and why?
')

The CLUSTER command arranges the table according to a certain index. Some indexes may specify order, and some may not.

If you have Postgres version <9.6, you can find the answer using the command

 select amname from pg_am where amclusterable ;

In Postgres 9.6+, the pg_am system table was severely trimmed, and for a particular index, you can find out its suitability for clustering using the pg_index_has_property function:

 select pg_index_has_property('index name or oid', 'clusterable');

You can read more about this function and its sisters in the article “Indexes in PostgreSQL, Part 2” by Egor Rogov.

If you are lucky and Postgres <9.6, this command will give two answers - the expected btree and the not so expected GiST. It would seem that what order sets the GIST-index? Let's uncover a terrible secret - CLUSTER simply rebuilds the table, bypassing it in order to bypass the index. For GiST, the order is not as defined as for B-Tree, and depends on the order in which the records were placed in the table. However, this order exists, and there are reports that clustering over GIST in some cases helps.

When using spatial indices, for example, it means that geometrically close objects will most likely be located closer to each other in the table.

11. Let synchronous replication be configured on the master and synchronous_commit = on.

At what value of the same option on the replica, the delay in executing the COMMIT on the master will be less, and why?

synchronous_commit = on on the wizard means that the wizard will consider the COMMIT complete only after receiving a message from the replica about successfully writing the corresponding part of WAL'a to the disk (well, or getting stuck in the OS buffer, if you have fsync = off, but it is not worth doing if the data is of at least some value). The trick is that the moment when WAL is flushed to disk is determined by the local value of the synchronous_commit, that is, its value on the replica.

If the replica is synchronous_commit = off, the recording will not happen immediately, but when the WAL writer process deems it necessary to perform it; to be more precise, he does this once in wal_writer_delay milliseconds, but with a heavy load on the system, only completely formed pages are dropped, so that the maximum time interval between the formation of a WAL recording and its writing to a disk with an asynchronous commit results in 3 * wal_writer_delay. All this time, the master will wait patiently; he cannot declare the transaction complete until the end of the record.

If synchronous_commit has a higher value (at least local) on the replica, it will immediately try to write WAL to the disk, and therefore the delay of the entire COMMIT will potentially be less. However, the synchronous commit record can be postponed using commit_delay, but that’s another story. From this all follows the obvious conclusion: the synchronous_commit turned off, which, it would seem, should reduce the COMMIT delay, increases it in the described scheme with replication.

You can read more about this in the archive of mailing lists and in the documentation:
Reliability and forward log , synchronous_commit .

12. What the query will give:

 select #array[1,2,3] - #array[2,3]

To answer this question, you need to know how PostgreSQL works with arrays. The unary operator “#” defined in the intarray extension calculates the length of the array. This suggests an answer: in the first array there are three elements, in the second - two. It would seem - the answer would be the number 1! But no, if you fulfill this request, the answer is TWO. From where

It is also important to take into account the priority of operators (it is tightly bound to the syntax and for user-defined operators this often leads to unobvious semantics). Unary # priority is lower than subtraction operator. Therefore, the correct query reads:

 select #( array[1,2,3] - #(array[2,3]))

This means that the number 2 is “subtracted” from the array [1,2,3], the array [1,3] remains, and at the end the length of this array is calculated. TWO. Almost everyone.

Toward the end of the quiz you need to activate a sense of humor.

13. How many records will be added to pg_class with the command:

 CREATE TABLE t (id serial primary key, code text unique);

This question “randomly” repeats question number 2. See also the next question. It should be noted that the answer to the 13th question, however, should not repeat the answer to the 2nd question :) Only one of the participants noted this. After all, table t is already created in question number 2. Repeat command will not create a single record in the database. (see also Tasks located along the chain , Quantum No. 10, 1987)

14. What was used in the preparation of this test: UNION or UNION ALL?

The history of this issue is as follows. Incidentally, in the process of preparing the quiz, the same question was printed in it twice. Seeing this, Ivan Frolkov joked “it was necessary to use UNION, not UNION ALL”. The joke was liked by “comrade colonel” ( www.anekdot.ru/id/-10077921 ), and the quiz was replenished with the 14th question.

Thanks

In the preparation of the quiz participated:
Alexey Shishkin, Alexey Ignatov, Arseny Sher, Anastasia Lubennikova, Alexander Alekseev, Ivan Panchenko, Ivan Frolkov.

For the 12th question, we are grateful to Nikolai Shulyakovsky from mail.ru.

The winners of the quiz received promotional codes that they can enter instead of paying for participation in PgConf.Russia 2018 .

Source: https://habr.com/ru/post/334386/

All Articles