Safety when using PostgreSQL

It so happened that I started working with PostgreSQL three years ago and during this time I managed to methodically collect all the possible rakes that one can imagine. And to tell you the truth, if I had the opportunity to share with me a three-year-old prescription with this bitter experience, my life would be much simpler and the nerve cells more whole. That is why I decided to write an absolutely subjective article with a set of rules that I adhere to when developing on PostgreSQL. Perhaps, this article will help someone to bypass the rakes I collected (and step on others, ha-ha!).

The same list of rules

Almost behind each of the points below is a sad story, full of suffering and arrogance. And the word “pain!” Marks the items worked out by the stories, with the recollection of which I still shudder at night.
')

Version the database schema

The database schema is the code you wrote. It should be in the version control system and versioned with the rest of the project. For PostgreSQL, I liked Pyrseas most of all for this purpose. It turns the schema with all PostgreSQL-specific objects into a yaml file, which is versioned. It is convenient to work with such a file in the branches and merge changes, unlike pure SQL. The final step yaml file is compared with the database schema and automatically generates a migration to SQL.
Pain! Never apply changes directly to the combat base.

Even if the change is simple, incredibly urgent and very desirable. First you need to apply it on the basis of the developers, commit to the branch, apply the changes on the base of the trunk (identical to the combat base). And only then, when all is well in the trunk, apply on a combat base. It is long, paranoid, but it saves from many problems.
Pain! Before you write delete or update, write where

And before you run the code, exhale, calculate to three and make sure that you are in the session of the desired base. I generally keep quiet about truncate, without three “Our Father”, do not even think about starting it, amen!
UPD . koropovskiy : more useful to set the set autocommit off for the current session.
tgz : or before each update and delete write begin.
Test Driven Development

First, always write tests, and then create database objects. We are talking about any objects: schemas, tables, functions, types, extensions - no exceptions! At first it seems hard, but afterwards you will thank yourself many times. Even at the initial creation of the scheme it is easy to miss something. And when refactoring tables in six months, only tests you write will save you from a sudden shot in the leg in any function. In the case of PostgreSQL, there is a remarkable pgTAP extension. I recommend for each scheme to create an additional scheme "schema_name_tap" in which to write functions for testing. And then just run the tests through pg_prove.
Do not forget to configure PITR

I'm afraid to play the role of Captain Obvious, but any base must have a backup configured. While it is desirable to be able to restore the base at any time. This is necessary not only to recover from failures, but also provides many interesting opportunities for developers to work in certain time slices of the database. In PostgreSQL, there is a barman for this.
Data consistency

Inconsistent data in the database never led to anything good. Even a small amount of them can easily turn the entire base into garbage. Therefore, you should never neglect normalization and constraints like foreign keys and checks. Use a denormalized form (for example, jsonb) only by making sure that it is not possible to implement the scheme in a normalized form with an acceptable level of complexity and performance - the denormalized view can potentially lead to inconsistent data. Answer to all the arguments of the supporters of denormalization that it was not easy to come up with normalization and be silent with a meaningful look.
Create foreign keys deferrable initially deferred

In this case, you postpone checking the restrictions on the end of the transaction, which allows you to get inconsistencies with impunity during its execution (but at the end everything is consistent or will cause an error). Moreover, by changing the flag inside the transaction to immediate, it is possible to force a constraint check at the right moment of the transaction.
UPD . The comments indicate that deferrable is an ambiguous practice that simplifies a number of import tasks, but complicates the debugging process within a transaction and is a bad practice for novice developers. Although I stubbornly tend to think that it is better to have deferrable keys than not to have them, consider an alternative look at the question.
Do not use the public scheme.

This is a utility scheme for functions from extensions. For your needs, create separate schemes. Treat them as modules and create a new schema for each logically separate set of entities.
Separate API Schema

For functions that are called on the application side, you can create a separate scheme “api_v_version number”. This will allow you to clearly control where the functions are located, which are the interfaces to your database. To name the functions in this scheme, you can use the template "entity_get / post / patch / delete_arguments".
Triggers for auditing

Triggers are best suited for auditing actions. I also recommend creating a universal trigger function to record any actions of an arbitrary table. To do this, you need to pull out the data about the structure of the target table from the information_schema and understand whether the old or new rows will be inserted depending on the action performed. Due to this decision, the code becomes ~~love and deceptive~~ more supported.
If you plan to use triggers to calculate the register of accumulations, then be careful in logic - one mistake and you can get inconsistent data. They say it is very dangerous kung fu.
Pain! Import data into the new scheme

The worst, but regularly occurring event in the life of a database developer. In PostgreSQL, FDW is very helpful, especially since they have been well pumped in 9.6 (if their developers are concerned, then FDW can build a plan on the remote side). By the way, there is such a convenient construction as “import foreign schema” , which saves from writing wrappers over a bunch of tables. It is also a good practice to have a set of functions that preserve a set of SQL commands for deleting and restoring existing foreign and primary keys in the database. I recommend importing by first writing a view set with data that is identical in structure to the target tables. And from them to make an insertion, using copy (not insert!). It is better to keep the entire sequence of SQL commands in a separate versioned file and run them through psql with the -1 key (in a single transaction). By the way, importing is the only case when you can turn off fsync in PostgreSQL by first making a backup and crossing your fingers.
Do not write to SQL: 1999

No, really, since then much water has flowed: a whole generation has graduated from school, cell phones have turned from bricks into supercomputers by the standards of 1999. In general, it is not necessary to write as our fathers wrote. Use "with", with it the code becomes cleaner and it can be read from top to bottom, and not to loop among join'ov blocks. By the way, if join is done in fields with the same name, then it is more concise to use “using”, and not “on”. And of course, never use offset in combat code. And there is also such a beautiful thing “join lateral”, which is often forgotten - and at this moment in the world the kitten is sad.
UPD . Using “with” do not forget that the result of its execution creates a CTE, which eats away memory and does not support indexes when querying it. So used too often and out of place “with” may adversely affect the performance of the request. So do not forget to analyze the request through the scheduler. “With” is especially good when you need to get a table that will be used differently in several parts of the query below. And remember, “with” radically improves the readability of the query and in every new version of PostgreSQL it works more efficiently. Other things being equal - prefer this particular construction.
Temporary tables

If you can write a query without temporary tables - do not hesitate and write! Typically, a CTE created with the "with" construct is an acceptable alternative. The fact is that PostgreSQL creates a temporary file for each temporary table ... and yes, another sad kitten on the planet.
Pain! The worst antipattern in SQL

Never use view constructs.
```
select myfunc() from table; 
```
The execution time of such a query increases linearly with the number of rows. Such a query can always be rewritten into something without the function applied to each line and win a couple of orders of magnitude in execution speed.
Main secret requests

If your query is slow on a test computer, then in production it will not work faster. Here is the best analogy about the roads with cars. A test computer is a one-lane road. Production server - a road with ten rows. Ten rows at rush hour will drive a lot more cars without traffic jams than one lane. But if your car is an old bucket, then as a Ferrari it will not go, how many free lanes it does not give.
Use indexes, Luke!

It depends on how correctly you create and use them, depending on whether the request is executed in the tenth of a second or minute. I recommend reading the Marcus Winand site on b-tree indexes - this is the best publicly available explanation on balanced trees I have seen on the Internet. And his book is also cool, yes.
group by or window function?

No, of course, window function can do more. But sometimes aggregation can be counted this way and that. In such cases, I am guided by the rule: if aggregation is considered to cover indices, only group by. If there are no covering indexes, you can try window function.
set_config

set_config can be used not only for setting up settings for postgresql.conf as part of a transaction, but also for passing a user variable to a transaction (if you define it in advance in postgresql.conf). Using such variables in a transaction can be very interesting to influence the behavior of called functions.
FTS and trigrams

They are wonderful! They give us a full-text and fuzzy search while maintaining all the power of SQL. Just do not forget to use them.
Call your own exceptions

Often, in a large project you have to raise a lot of exceptions with your codes and messages. In order not to get confused in them, there is an option to create for exceptions a separate type of “code - exception text” type, as well as functions for calling them (wrapper over “raise”), adding and deleting. And if you have covered all your database objects with tests, then you cannot accidentally delete the exception code that is already used somewhere.
A lot of paranoia is never enough

Good practice is not to forget to configure ACL on the tables, and to start the functions with the “security definer”. When the functions are read-only, Feng Shui requires setting the “stable” flag for them.
Pain! Cherry on the cake

UPD . You can never redirect the application user through the server to the database, one-to-one translating the application user to the database user. Even if it seems to you that in this case you can configure in the database security for users and their groups using regular PostreSQL tools, never do so, this is a trap! With this scheme, you cannot use connection pools, and each connected user of the application will eat off a resource-intensive connection to the database. Databases hold hundreds of connections, and servers hold thousands, and for this very reason, load balancers and connection pools are used in applications. And with one-to-one translation of each user into the database, as the load increases, you will have to break the scheme and rewrite everything.

Source: https://habr.com/ru/post/314048/

All Articles

Safety when using PostgreSQL

The same list of rules

More articles: