Postgres Pro Quiz Task at Highload ++ 2017

At Higload ++ 2017 in Skolkovo, our company Postgres Professional again held a quiz with the traditional distribution of nishtyakov, which were tickets for the February PgConf.Russia 2018 .

This article deals with quiz questions.

1. What type of indices allows you to effectively perform a SELECT * FROM users WHERE query lower (name) = 'vasya'?

The lower function in PostgreSQL causes the string to be lowercase.
')
a) Partial
b) Functional
c) Covering
d) Secondary

Many have guessed that this query helps to execute the functional index on the expression used in the query lower(name) :

 CREATE INDEX ON users ( lower(name) );

However, only one of the winners of the quiz noticed that the partial index

 CREATE INDEX ON users ( somefield) WHERE lower(name) = 'vasya' ;

where somefield is any field of the users table, it will also be effective.

You can also use the citext data type to organize context-independent strings.

2. What type of indexes from the following list does NOT exist?

a) Partial
b) Secondary
c) closing
d) Functional

The correct answer: closing. Covering indexes are already there, and the closing is still waiting for its creator. They can become you!
The covering index for a particular query is an index that contains enough data to execute the query.
To make it easier to create useful covering indexes, the CREATE INDEX...INCLUDE construct was developed for PostgreSQL. You can read about it in the presentation of Anastasia Lubennikova and Ivan Frolkov , the authors of this functionality in PostgreSQL. So far, she hasn’t gotten into the community version, and is only available in Postgres Pro.

The essence of the INCLUDE construction is the addition to the index of additional (non-key) information that extends the applicability of index only scan. We illustrate his work with a simple example.

Index only scan is a mode of query execution, when there is enough information available in the index to determine its result, and you do not need to look into the table itself.

Let u have a table with fields and an index for them:

 CREATE TABLE users ( name text, score int ); REATE INDEX users_name_score ON users ( name, score );

If the table is a query

  SELECT name, score FROM users WHERE name = 'vasya';

then the execution of this query is possible in the index only scan mode.

Let's complicate the task: let us also require control of the uniqueness of the name field.
For this index users_name_score not suitable, because if you add the UNIQUE parameter to its definition, it will control the uniqueness of the pair of fields ( name, score ).

You can create a separate unique index for the name field, but this is resource-intensive, and our covering indexes come to the rescue:

 CREATE UNIQUE INDEX ON users ( name ) INCLUDE ( score );

Now the values of the score field will be added to the index, but not to its key part, and this will allow to solve 2 problems with one index - both control of uniqueness and provision of index only scan.

3. Which of the following is NOT an index?

a) Hash
b) GIN
c) WHISKEY
d) RUM

Index WHISKEY yet. Think what can this mean? WHIthout Specific KEY? By the way, how exactly RUM stands for is not yet decided either.

PostgreSQL has a Hash index for a long time, but in practice it can only be used starting from version 10, in which support for WAL logs finally appeared for it. Compared to B-Tree, the effectiveness of HASH indices was seriously doubted, which explains their long stay in semi-done state.

GIN (Generalized Inverse iNdex) is a generalized inverse index developed by Oleg Bartunov and Fyodor Sigaev. In details about his device is written in the article of Egor Rogov . It is good for full-text search, array indexing, JSON and JSONB.

RUM is a new type of index developed by Postgres Professional that allows you to speed up full-text search. RUM is implemented in an extension that is part of Postgres Pro Enterprise, but its sources are open source and are available on github . RUM is somewhat heavier than GIN when modifying data, but it allows you to speed up the search due to the storage of additional information in the index.

4. What does the letter I do in ACID?

a) transaction log
b) MVCC
c) Foreign keys
d) Shared page cache

The correct answer is MVCC (MultiVersion Concurrency Control, Multi-Version Concurrency Control). ACID is a DBMS device principle, an abbreviation meaning Atomicity, Consistency, Isolation, Durability. In this question, we are interested in I - Isolation of transactions, and in order to achieve it, multi-versioning is used. At each time point in the database, there are as many versions of each record as needed so that each transaction can see the version that is correct for it. Thanks to MVCC, for example, a transaction with UPDATE that changed a record does not interfere with reading the same record by everyone else. Old, useless versions are removed by a special vacuum process ( Vacuum ). Multi-versioning to ensure transaction isolation with competitive data access is used in most relational DBMSs.

5. Can logical replication be synchronous?

PostgreSQL 10 officially appeared logical replication . For it to work in synchronous mode, the subscription name specified in the CREATE SUBSCRIPTION command must be mentioned in the synchronous_standby_names parameter on the wizard.

6. Can user X find out which temporary tables user Y created? If so, how, if not, why?

Maybe because temporary tables are arranged in the same way as usual, only special schemes are allocated for them - one for each user session.

  SELECT nspname, ( SELECT json_agg(json_build_object('name',relname, 'owner', rolname)) FROM pg_class JOIN pg_roles ON pg_roles.oid = relowner WHERE relnamespace = pg_namespace.oid ) FROM pg_namespace WHERE nspname LIKE 'pg_temp%';

With this query, you can see all temporary tables in the system, and by whom they are created.

7. In what language should stored procedures be written so that they are executed the fastest?

Trick question. Most likely, of course, in C. But sometimes it turns out to be SQL, because A SQL function can be included in the query that calls it and optimized with it. You can read more about this property of functions in the SQL language (not to be confused with PL / PGSQL) in the article by Ivan Frolkov .

Among the usual procedural languages, PL / PgSQL is not the fastest. In some cases, PL / Perl or PL / Python, and even more so PL / v8 is much faster. The difference is particularly strong if the procedures are involved not only in working with the database, but also in calculations or data processing.

For questions about comparing procedural languages, you can refer to their documentation ( PL / Perl , PL / Python , PL / v8 ) and the report of Ivan Panchenko .

8. How many times will function F be called in a query
`SELECT F(a%a) FROM generate_series(-1,1) a` ?

The correct answer is once. The expression a%a means taking the remainder of dividing the variable a into itself. It is impossible to divide by zero, therefore at the second iteration the request will fall down. Thus, only the first value of the variable a - minus one will be able to get into the function F Read more about the generate_series function in the documentation .

9. SELECT query t.abc FROM t; succeeds, although there is no abc column in table t (and t is indeed a table). How is this possible, and what will be in the abc column of the sample?

This is possible if the function abc(t%rowtype) . In this case, t.abc is equivalent to abc(t) . This is a documented possibility, a rudiment of objectivity in a postgres.

10. A PostgreSQL instance worked with the fsync = off parameter (we do not recommend this), servicing the sessions that are actively writing to the database. In the midst of the work of the OOM killer killed one of the backends, and the postmaster killed the entire copy accordingly. After that, the PostgreSQL instance was restarted. How will the recovery process end and why?

Most of the participants knew that fsync=off is very, very bad: in case of failure, you can lose data. But at what failure? fsync is flushing the OS buffer to disk. When can the OS buffer not be flushed to disk? Of course, when the functioning of the OS kernel is disrupted. For example, with a hardware failure. In the condition of the task, the OS is functioning normally - so there is every reason to expect that the data on the disk will be reset, and the next time the postgress is launched, the base will be restored from the WALs.

Thanks

Ivan Frolkov, Alexander Alekseev, Arseny Sher and Ivan Panchenko took part in the compilation of questions and answers. And, of course, it is impossible not to sing difaramba against HighLoad ++ and its organizers. Well done, as always were on top. This holiday.

Quiz result

50 Highload participants answered the quiz questions. As a result, 7 free invitations to the PgConf.Russia 2018 conference, which is planned to be held in Moscow on February 5-7, 2018, were raffled off.

Source: https://habr.com/ru/post/342804/

All Articles