PostgreSQL: Production Techniques

You can read a lot of books on databases, write a bunch of applications to outsource or for yourself. But at the same time it is impossible not to step on a rake, when working with really large databases / tables, especially when you want to minimize downtime on a large project, and even better to avoid it completely. Here are the most simple operations, such as changing the structure of a table can be more difficult. The most interesting cases, problems, rakes and their solutions from personal experience with which we on the Pushwoosh project had to face are described under the cut. The article does not have beautiful pictures, but there is a lot of dry text.

Adding a new column to an existing table

It seems to be an ordinary operation, you need to make a new feature and expand the already existing table. But what to do if there are several thousand requests from live clients to the table on the production of 50Gb data, and to it every second, this should be invisible to customers.
')
Task : add a new column for an existing table in production.
Problem : Adding a new column locks the table.
Solution : adding a new column does not block the table if DEFAULT is omitted or DEFAULT NULL is used.

For those who are interested, you can read the full documentation .

Main excerpt from the documentation

When the column is added to the table, it will be initialized with the column's default value (NULL if no DEFAULT clause is specified). It is not necessary to clarify the table; the added NULL values are readout, instead.

Adding a column to the table. There is no number of words for a group of people who want to make a new one. but any indexes must be rebuilt. Oid the column table. Table and / or index rebuilds for a large table; the disk space.

But what if you want to initialize a new column with the default value? In this case, you can write a script that will follow the table, and change the value of the field in small portions of records.

Adding a new index to an existing table

Now the analytics department has come to us, and wants to receive the data of interest to them every hour, for this we need a new sql query, but it is complicated, but we don’t have an index.

Task : add an index to an existing production table
Problem : Adding an index locks the record (insert / update / delete) into the table. You can still read from this table.
Solution : use CREATE INDEX CONCURRENTLY, which does not lock the table, but it works about twice as long and requires more system resources.

For those who are interested you can read the full documentation .

Main excerpt from the documentation

Creating an index can interfere with a regular operation of a database. PostgreSQL | If you’re trying to make it work, you’ll have to do it. This could have a severe effect. It is unacceptably long for a production system.

PostgreSQL supports building indexes without locking out writes. This method is invoked by specifying the CONCURRENTLY option of CREATE INDEX. If you’re trying to make it, you can’t make it your choice. Thus this method of building and takes significantly longer to complete. However, since it has been chosen, it can be used for the environment. Of course, it wouldn’t be possible to create slower speeds.

Monitoring current database requests

It is important to know which requests are being executed at the moment, which process is performing them and how much the request is already running. PostgreSQL has an excellent pg_stat_activity service tag. Its full description can be found in the documentation . I will cite only the most interesting fields.

pid	integer	Process ID of this backend
query_start	timestamp with time zone	It is not active when it is not active.
query	text	Text of this backend's most recent query. If this field is currently active, it shows the current executing query. It shows what was executed.

As a result, once we have the start time of the query_start query, it is easy to calculate how much the query works and sort the queries by runtime.

pushwoosh=# select now() - query_start, pid, waiting, query from pg_stat_activity where state != 'idle' order by 1 desc; ?column? | pid | waiting | query ----------------------------------------------------------------------------------------------------------------------------------------------------------------------- 23:42:13.468115 | 6877 | f | DELETE FROM application_goals WHERE applicationid = '9254' 00:30:51.943691 | 24106 | f | SELECT applicationid, pushtoken FROM application_devices WHERE hwid in ('1abd5e5fd79318cd','1abd5f3eda7acbca','1abd601f2bafabf8','1abd62a0a092ac17','1abd6303b83accf9','1abd64726a98fb63','1abd676c087c3617','1abd67ebecb6f3ce','1abd68a3b78fb730','1abd697af6bc8552','1abd70ebb654aeb2','1abd7114a8576a67','1abd729a385caff8','1abd731ff62c4521','1abd738bd2d457eb','1abd7760f7210155','1abd79dbc085c2c0','1abd7ab46dc24304','1abd7d48bd5e04ab','1abd7e7aee3c0e58','1abd7e8129a53ab3','1abd827c8c21630','1abd82cd204c69a9','1abd843ee3dedb1','1abd88d346c74d67','1abd88e8bd01c168','1abd8ceac00808cc','1abd8d3b2cb72de3','1abd8e139f267260','1abd8e74a288204c','1abd8f00bb4a0433','1abd8fd7e8f4f125','1abd91c193455ada','1abd92448396a9bf','1abd946ac4cf0e22','1abd9594ed1bd791','1abd96cc0df2202b','1abd975a98849a0b','1abd997c96d3c9b1','1abd9b3cfb66852c','1abd9bead472be5','1abd9f5bed3cbbd8','1abd9f73b8122bf1','1abda233b9a00633','1abda2ee3db5bccb','1abda486901c3a14','1abdac09e0e3267b','1abdae8235cf19dd','1abdaf9e3a143041','1abdb54fe96' 00:04:49.592503 | 18899 | f | autovacuum: ANALYZE public.device_tags_values 00:00:00.040265 | 11748 | f | INSERT INTO device_tags_values (hwid,valueid) VALUES ('27976b81cc72c7ac','8470317') RETURNING uid

For PosgreSQL 9.1, the query can be rewritten as

 select now() - query_start, procpid, waiting, current_query from pg_stat_activity where current_query != '<IDLE>' order by 1 desc;

From this conclusion, we see that the process with PID 6877 already almost a day fulfills the request, which is probably not optimal and needs more detailed profiling. We also see that the second request takes half an hour and most likely is also not optimal, but we do not see the request completely, it is cut off, but we are also interested in the entire request.
Task : to see what requests are currently being executed in the database.
Problem : pg_stat_activity shows current requests not completely (cuts off).
Solution : gdb .

Perhaps this problem has a simpler solution, but we did not find it. Take the PID from the query above and connect to it

 gdb [path_to_postgres] [pid]

and after connecting to the process we perform

 printf "%s\n", debug_query_string

In our case

bash $ gdb postgres 24106
GNU gdb (GDB) Red Hat Enterprise Linux 7.6.1-64.el7
Copyright 2013 Free Software Foundation, Inc.
License GPLv3 +: GNU GPL version 3 or later < gnu.org/licenses/gpl.html >
This is free software:
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
For bug reporting instructions, please see:
< www.gnu.org/software/gdb/bugs > ...
Reading symbols from /usr/bin/postgres...Reading symbols from /usr/bin/postgres...(no debugging symbols found) ... done.
(no debugging symbols found) ... done.
Attaching to program: / bin / postgres, process 24106
Reading symbols from /lib64/libxml2.so.2...Reading symbols from /lib64/libxml2.so.2...(no debugging symbols found) ... done.
(no debugging symbols found) ... done.
...
(gdb) printf "% s \ n", debug_query_string

Slow log

For the base, you always need to have monitoring and see which requests are executed slowly. You can solve this by code, and in ORM or somewhere deeper measure the time of the request and if it is more than the threshold value, write this request to the log. But it is always better not to write a bicycle.
Task : Monitor slow queries
Problem : I want to do this at the database level
Solution : log_min_duration_statement

The log_min_duration_statement setting is specified in milliseconds, and logs all database requests that were executed longer than the specified value.

Let's fix the PostgreSQL config vim /var/lib/pgsql/9.4/data/posgresql.conf and set 3 seconds in it as a threshold value

 log_min_duration_statement = 3000 # -1 is disabled, 0 logs all statements # and their durations, > 0 logs only # statements running at least this number # of milliseconds

In order for the changes to take effect, it is not necessary to reboot the database, just run the command from psql, pgadmin or another interface to the database

 SELECT pg_reload_conf();

or execute from the command line

 su - postgres /usr/bin/pg_ctl reload

It must be remembered that some parameters in the configuration file will take effect only after restarting the database.

And after that you can look at the PostgreSQL log, which we have along this path /var/lib/pgsql/9.4/data/pg_log/postgresql-2015-07-07.log and here we see that there is a query that runs for almost 6 seconds .

 2015-07-07 09:39:30 UTC 192.168.100.82(45276) LOG: duration: 5944.540 ms statement: SELECT * FROM application_devices WHERE applicationid='1234' AND hwid='95ea842e368f6a64' LIMIT 1

As an option in the future, to monitor the log file, you can make a bundle of logstash + elasticsearch + kibana and immediately send a notification about the appearance of slow requests via zabbix, if this is critical for the project.

Find out what requests the production process is currently doing

If you have a lot of demons that often communicate with the base, and on one ordinary day the demon started working slowly or it became unclear what he was doing, then strace will come to the rescue, which will show the requests to the base and the time of their execution, without stopping the process, adding logs to the program and waiting for the next occurrence of the problem - no matter what you write php, python, ruby, etc. - strace fits for everything.
Task : find out what the process is doing (as an example, what requests it sends to the database)
Problem : the process cannot be interrupted or stopped.
Solution : strace

To do this, simply take the pid of the process, specify the length and add the -T option. As a result, strace output can be something like this.

strace -p 27345 -s 1024 -T 2> out

 gettimeofday({1437846841, 447186}, NULL) = 0 <0.000004> sendto(8, "Q\0\0\0005SELECT * FROM accounts WHERE uid='25143' LIMIT 1\0", 54, MSG_NOSIGNAL, NULL, 0) = 54 <0.000013> poll([{fd=8, events=POLLIN|POLLERR}], 1, -1) = 1 ([{fd=8, revents=POLLIN}]) <0.000890>

NULL and unique indexes

This example is not related to production environment. We start with the simple fact that NULL is not equal to NULL.

Documentation

Do not write expression = NULL because NULL is not "equal to" NULL. The two values are equal.

Suppose we want to create a composite unique index in which one of the fields can be NULL, or it can be a number. At the same time, a unique index will not work for fields containing NULL, but I really want to filter such records by index.

Documentation

It is not allowed. Null values are not considered equal. A multicolumn unique index.

Consider an example

 psql=# create table test ( psql(# a varchar NOT NULL, psql(# b varchar default null psql(# ); CREATE TABLE psql=# create unique index on test (a, b); CREATE INDEX psql=# insert into test values (1, null); INSERT 0 1 psql=# insert into test values (1, null); INSERT 0 1 psql=# select * from test; a | b ---+--- 1 | 1 | (2 rows)

Despite the fact that we created a unique index, the record (1, null) was inserted twice.

Task : to make a composite unique index, one of the fields can be null
Problem : unique index does not compare null fields
Solution : split the index into 2 different indexes.

In order to avoid this behavior, you can split the index into 2 indexes.

 sql=# create table test ( sql(# a varchar NOT NULL, sql(# b varchar default null sql(# ); CREATE TABLE sql=# create unique index on test (a, b) where b is not null; CREATE INDEX sql=# create unique index on test (a) where b is null; CREATE INDEX sql=# insert into test values (1, null); INSERT 0 1 sql=# insert into test values (1, null); ERROR: duplicate key value violates unique constraint "test_a_idx" DETAIL: Key (a)=(1) already exists.

Duplicate handling

This example also has nothing to do with production.
Task : To be able to handle duplicates that were not allowed to create a unique index.
Problem : You need to catch exceptions, parse error codes
Solution : keep calm and catch exception and wait until PostgreSQL 9.5 is released which will be upsert

Source: https://habr.com/ru/post/263541/

All Articles