At Higload ++ 2017 in Skolkovo, our company Postgres Professional again held a quiz with the traditional distribution of nishtyakov, which were tickets for the February
PgConf.Russia 2018 .
This article deals with quiz questions.

1. What type of indices allows you to effectively perform a SELECT * FROM users WHERE query lower (name) = 'vasya'?
The lower function in PostgreSQL causes the string to be lowercase.
')
a) Partial
b) Functional
c) Covering
d) Secondary
Many have guessed that this query helps to execute the functional index on the expression used in the query
lower(name)
:
CREATE INDEX ON users ( lower(name) );
However, only one of the winners of the quiz noticed that the
partial index CREATE INDEX ON users ( somefield) WHERE lower(name) = 'vasya' ;
where
somefield
is any field of the
users
table, it will also be effective.
You can also use the
citext data type to organize context-independent strings.
2. What type of indexes from the following list does NOT exist?
a) Partial
b) Secondary
c) closing
d) Functional
The correct answer: closing. Covering indexes are already there, and the closing is still waiting for its creator. They can become you!
The covering index for a particular query is an index that contains enough data to execute the query.
To make it easier to create useful covering indexes, the
CREATE INDEX...INCLUDE
construct was developed for PostgreSQL. You can read about it in the
presentation of Anastasia Lubennikova and Ivan Frolkov , the authors of this functionality in PostgreSQL. So far, she hasn’t gotten into the community version, and is only available in Postgres Pro.
The essence of the
INCLUDE
construction is the addition to the index of additional (non-key) information that extends the applicability of index only scan. We illustrate his work with a simple example.
Index only scan is a mode of query execution, when there is enough information available in the index to determine its result, and you do not need to look into the table itself.
Let u have a table with fields and an index for them:
CREATE TABLE users ( name text, score int ); REATE INDEX users_name_score ON users ( name, score );
If the table is a query
SELECT name, score FROM users WHERE name = 'vasya';
then the execution of this query is possible in the index only scan mode.
Let's complicate the task: let us also require control of the uniqueness of the
name
field.
For this index
users_name_score
not suitable, because if you add the
UNIQUE
parameter to its definition, it will control the uniqueness of the pair of fields (
name, score
).
You can create a separate unique index for the name field, but this is resource-intensive, and our covering indexes come to the rescue:
CREATE UNIQUE INDEX ON users ( name ) INCLUDE ( score );
Now the values of the score field will be added to the index, but not to its key part, and this will allow to solve 2 problems with one index - both control of uniqueness and provision of index only scan.
3. Which of the following is NOT an index?
a) Hash
b) GIN
c) WHISKEY
d) RUM
Index WHISKEY yet. Think what can this mean? WHIthout Specific KEY? By the way, how exactly RUM stands for is not yet decided either.
PostgreSQL has a
Hash index for a long time, but in practice it can only be used starting from version 10, in which support for WAL logs finally appeared for it. Compared to B-Tree, the effectiveness of HASH indices was seriously doubted, which explains their long stay in semi-done state.
GIN (Generalized Inverse iNdex) is a generalized inverse index developed by Oleg Bartunov and Fyodor Sigaev. In details about his device is written in the
article of Egor Rogov . It is good for full-text search, array indexing, JSON and JSONB.
RUM is a new type of index developed by Postgres Professional that allows you to speed up full-text search.
RUM is implemented in an extension that is part of Postgres Pro Enterprise, but its sources are open source and are available on
github . RUM is somewhat heavier than GIN when modifying data, but it allows you to speed up the search due to the storage of additional information in the index.
4. What does the letter I do in ACID?
a) transaction log
b) MVCC
c) Foreign keys
d) Shared page cache
The correct answer is MVCC (MultiVersion Concurrency Control, Multi-Version Concurrency Control). ACID is a DBMS device principle, an abbreviation meaning Atomicity, Consistency, Isolation, Durability. In this question, we are interested in I - Isolation of transactions, and in order to achieve it, multi-versioning is used. At each time point in the database, there are as many versions of each record as needed so that each transaction can see the version that is correct for it. Thanks to MVCC, for example, a transaction with UPDATE that changed a record does not interfere with reading the same record by everyone else. Old, useless versions are removed by a special vacuum process (
Vacuum ). Multi-versioning to ensure transaction isolation with
competitive data
access is used in most relational DBMSs.
5. Can logical replication be synchronous?
PostgreSQL 10 officially appeared
logical replication . For it to work in synchronous mode, the subscription name specified in the
CREATE SUBSCRIPTION
command must be mentioned in the
synchronous_standby_names
parameter on the wizard.
6. Can user X find out which temporary tables user Y created? If so, how, if not, why?
Maybe because temporary tables are arranged in the same way as usual, only special schemes are allocated for them - one for each user session.
SELECT nspname, ( SELECT json_agg(json_build_object('name',relname, 'owner', rolname)) FROM pg_class JOIN pg_roles ON pg_roles.oid = relowner WHERE relnamespace = pg_namespace.oid ) FROM pg_namespace WHERE nspname LIKE 'pg_temp%';
With this query, you can see all temporary tables in the system, and by whom they are created.
7. In what language should stored procedures be written so that they are executed the fastest?
Trick question. Most likely, of course, in C. But sometimes it turns out to be SQL, because A SQL function can be included in the query that calls it and optimized with it. You can read more about this property of functions in the SQL language (not to be confused with PL / PGSQL) in the
article by Ivan Frolkov .
Among the usual procedural languages, PL / PgSQL is not the fastest. In some cases, PL / Perl or PL / Python, and even more so PL / v8 is much faster. The difference is particularly strong if the procedures are involved not only in working with the database, but also in calculations or data processing.
For questions about comparing procedural languages, you can refer to their documentation (
PL / Perl ,
PL / Python ,
PL / v8 ) and the report of
Ivan Panchenko .
8. How many times will function F be called in a query
SELECT F(a%a) FROM generate_series(-1,1) a
?
The correct answer is once. The expression
a%a
means taking the remainder of dividing the variable
a
into itself. It is impossible to divide by zero, therefore at the second iteration the request will fall down. Thus, only the first value of the variable
a
- minus one will be able to get into the function
F
Read more about the
generate_series
function in the
documentation .
9. SELECT query t.abc FROM t; succeeds, although there is no abc column in table t (and t is indeed a table). How is this possible, and what will be in the abc column of the sample?
This is possible if the function
abc(t%rowtype)
. In this case,
t.abc
is equivalent to
abc(t)
. This is a
documented possibility, a rudiment of objectivity in a postgres.
10. A PostgreSQL instance worked with the fsync = off parameter (we do not recommend this), servicing the sessions that are actively writing to the database. In the midst of the work of the OOM killer killed one of the backends, and the postmaster killed the entire copy accordingly. After that, the PostgreSQL instance was restarted. How will the recovery process end and why?
Most of the participants knew that
fsync=off
is very, very bad: in case of failure, you can lose data. But at what failure?
fsync
is flushing the OS buffer to disk. When can the OS buffer not be flushed to disk? Of course, when the functioning of the OS kernel is disrupted. For example, with a hardware failure. In the condition of the task, the OS is functioning normally - so there is every reason to expect that the data on the disk will be reset, and the next time the postgress is launched, the base will be restored from the WALs.
Thanks
Ivan Frolkov, Alexander Alekseev, Arseny Sher and Ivan Panchenko took part in the compilation of questions and answers. And, of course, it is impossible not to sing difaramba against HighLoad ++ and its organizers. Well done, as always were on top. This holiday.

Quiz result
50 Highload participants answered the quiz questions. As a result, 7 free invitations to the
PgConf.Russia 2018 conference, which is planned to be held in Moscow on February 5-7, 2018, were raffled off.