Hello again! Next Tuesday, a new stream will start at the rate of
"Relational DBMS" , so we continue to publish useful material on the topic. Go.

Last week, I wrote about
Postgres competitive access , which commands are blocking each other, and how you can diagnose blocked commands. Of course, after the diagnosis, you may need treatment. With Postgres, you can shoot yourself in the foot, but Postgres also offers you ways not to knock down a tip. Here are some of the important tips on how to
stay and how not to do that we found useful when working with users to migrate from their Postgres single database to
Citus or when creating new
real-time analytics applications.
1. Never add a column with a default value.
PostgreSQL Golden Rule: When adding a column to a table in a production environment,
never specify a default value .
')
Adding a column requires a very aggressive locking of the table, which blocks both reading and writing. If you add a column with a default value, PostgreSQL will rewrite the entire table to fill in the default value for each row, which can take several hours in large tables. At the same time, all requests will be blocked, so your database will not be available.
Do not do this:
Do better like this:
Or, better yet, avoid updating and
delete
locks for a long time, updating in small portions, for example:
do { numRowsUpdated = executeUpdate( "UPDATE items SET last_update = ? " + "WHERE ctid IN (SELECT ctid FROM items WHERE last_update IS NULL LIMIT 5000)", now); } while (numRowsUpdate > 0);
This way, you can add and fill in a new column with minimal interference for your users.
2. Beware of lock queues, use timeouts.
Each lock in PostgreSQL has a sequence. If transaction B attempts to take over a lock that is already held by transaction A with a conflicting lock level, transaction B will wait in the lock queue. Now something interesting happens: if another transaction C arrives, it will have to check not only the conflict with A, but also with transaction B and any other transaction in the lock queue.
This means that even if your DDL command is able to execute very quickly, it can be in the queue for a long time, waiting for requests to complete,
and requests that run after it will be blocked behind it .
If you may have long
SELECT
queries to the table, do not do this:
ALTER TABLE items ADD COLUMN last_update timestamptz;
Better do this:
SET lock_timeout TO '2s' ALTER TABLE items ADD COLUMN last_update timestamptz;
When
lock_timeout
set
lock_timeout
DDL command will not be executed if it is waiting for a lock and, thus, blocks requests for more than 2 seconds. The disadvantage is that your
ALTER TABLE
may not be executed, but you can try again later. You can query
pg_stat_activity to see if you have long queries before running the DDL command.
3. Use non-blocking index creation
Another PostgreSQL golden rule: always use non-blocking index creation.
Creating an index for a large dataset can take hours or even days, and the usual
CREATE INDEX
command locks all entries for the duration of the command. Although it does not block SELECTs, it’s still pretty bad, and there’s a better way:
CREATE INDEX CONCURRENTLY
.
Do not do this:
Instead, do this:
Non-blocking index creation has a downside. If something goes wrong, it does not roll back and leaves an incomplete (“invalid”) index. If this happens, do not worry, just run
DROP INDEX CONCURRENTLY items_value_idx
and try to create it again.
4. Use aggressive locks as late as possible.
When you need to run a command that gets aggressive table locks, try to do it as late as possible in the transaction so that the queries can last as long as possible.
For example, if you want to completely replace the contents of the table. Do not do this:
BEGIN;
Instead, load the data into a new table, and then replace the old one:
BEGIN; CREATE TABLE items_new (LIKE items INCLUDING ALL);
There is one problem: we did not block the records from the very beginning, and the old table of elements could have changed by the time we reset it. To prevent this, we can explicitly lock the table for writing, but not for reading:
BEGIN; LOCK items IN EXCLUSIVE MODE; ...
Sometimes it is better to take blocking in your hands.
5. Adding a primary key with minimal locking
Often, adding a primary key to your tables is a good idea. For example, if you want to use logical replication or migrate a database using
Citus Warp .
Postgres makes it very easy to create a primary key using
ALTER TABLE
, but as long as an index is created for the primary key, it can take a long time if the table is large, all queries will be blocked.
ALTER TABLE items ADD PRIMARY KEY (id);
Fortunately, you can do all the hard work first, using
CREATE UNIQUE INDEX CONCURRENTLY
, and then use the unique index as the primary key, which is a quick operation.
CREATE UNIQUE INDEX CONCURRENTLY items_pk ON items (id);
The division of the creation of a primary key into two stages has virtually no effect on the user.
6. Never use VACUUM FULL
User experience postgres can sometimes be a little awesome. Although
VACUUM FULL
sounds like what you would like to do to clean up the “dust” of your database, a more appropriate command would be:
PLEASE FREEZE MY DATABASE FOR HOURS;
VACUUM FULL
overwrites the entire table to disk, which may take hours or days, and at the same time blocks all requests. Although for
VACUUM FULL
there are several valid use cases, such as a table that was previously large, but now it is small and still takes up a lot of space, but this is probably not your option.
Although you should strive to tweak auto-clean options and use indexes to speed queries, you can sometimes run
VACUUM
, but NOT
VACUUM FULL
.
7. Avoid deadlocks by organizing commands.
If you have been using PostgreSQL for some time, most likely you have seen errors such as:
ERROR: deadlock detected DETAIL: Process 13661 waits for ShareLock on transaction 45942; blocked by process 13483. Process 13483 waits for ShareLock on transaction 45937; blocked by process 13661.
This happens when parallel transactions take the same lock in a different order. For example, one transaction executes the following commands.
BEGIN; UPDATE items SET counter = counter + 1 WHERE key = 'hello';
At the same time, another transaction may issue the same commands, but in a different order.
BEGIN UPDATE items SET counter = counter + 1 WHERE key = 'world';
If these transaction blocks are executed at the same time, it is likely that they will be stuck waiting for each other and never complete. Postgres recognizes this situation in about a second and cancels one of the transactions to complete the other. When this happens, you should take a look at your application to see if you can ensure that your transactions are always executed in the same order. If both transactions change
hello
first, then
world
, then the first transaction will lock the second one to
hello
before it can take any other locks.
Share your tips!
We hope you found these recommendations helpful. If you have other tips, feel free to tweet
@citusdata or in our active community of Citus users in
Slack .
We remind you that in a few hours there will be an
open door on which we will describe in detail about the program of the upcoming course.