📜 ⬆️ ⬇️

Common mistakes when working with PostgreSQL. Part 2

We continue to publish videos and transcripts of the best reports from the PGConf.Russia 2019 conference. The first part of the report by Ivan Frolkov was about inconsistent naming, about constraints, about where it is better to concentrate the logic - in the database or in the application. In this part, you will find analysis of error handling, concurrent access, non-cancellable operations, CTE and JSON.



I will tell this story. Our client says: “The base is working slowly, and our application deals with serving the population. We are afraid that we will be raised on the pitchfork here. ” It turned out that they had a lot of processes in idle in transaction state. The application has started a transaction, does nothing, but the transaction does not complete. If you interact with some external services, then, in principle, this is a normal situation. Another thing is that if your idle in transaction state lasts a long time (more than a minute is already suspicious), then this is bad because PostgreSQL doesn’t like long transactions: VACUUM will not be able to clean all the lines that it could see and hang for a long time transaction effectively blocks VACUUM. Tables begin to swell, indexes are becoming less and less effective.
')


In this case, people did not quite correctly write requests and receive Cartesian works — such requests were executed for several days. Well, the user, he will press the button, wait for the result and, if there is no result, press the button again.

But this did not explain why they have so many processes in idle in transaction . And they appeared in what situation: the application climbs into the database, starts a transaction, climbs on some external service, gets an error there, and then everything just crumbles, we print the trace trace in the log, and we calm down on that. The connection remains abandoned, hangs and interferes.

What to do with it? First, you need to handle errors always. If an error has arrived, please do not ignore it. It's good if PostgreSQL has lost the connection: it will roll back the transaction, we will survive. On this I will stop. Well, if there is a code that doesn’t have time to edit at all, then we still have max idle in transaction - it can be delivered, and it will just knock out inactive transactions.



A typical case of "processing" errors: EXCEPTION WHEN OTHERS THAN NULL. We once argued with a colleague about terminology. I said that it translates as "burn it all with a blue flame", and he - "damn it all with a shit." If something bad happened, even if everything fell into the log with curses, it is still better than complete silence - like here.



If you do not know what to do with an error, then do not intercept it. A very common practice: they caught the error, logged it and ran on, as if nothing had happened. If you, again, engage in cash transactions, and you have an error that you ignored, the results can be unpredictable. In the 90s they could have been transported to the forest, for example, in the trunk. Now times have become softer, but also a little pleasant.



If we perform an operation on the client, then, usually, we return the value: everything went well or unsuccessfully. And we handle every mistake. I saw how people specifically wrote the plpgsql code, where they intercepted the error, wrote to the log that, they say, yes, there was a mistake and rather gross, they inserted their message text. But SQLSTATE did not return. This is always done, so if they forgot to check something, then they started having problems.

All, for some reason, are afraid of exceptions - both in plpgsql , and in other languages. And if you do not invent something of your own, and use the standard features of the language, everything usually turns out well. Especially this problem often occurs when the connection drops. It fell, the idle in transaction process, the base is filled, the performance drops. By the way, such a transaction may still leave locks, but this, for some reason, is not as common. Therefore, add the finally error handling code and clean the connection there, give it back to the server.



Moreover, if you have well, the constraints are correctly named, you can already, when handling an error, throw an exception not from the database, but from the application. In spring there is an exception translation , in php , respectively, set_exception_handler . Pay attention to the tools that your framework provides to you, they appeared there for a reason.

So: do not intercept an error with which you do not know what to do; call errors carefully and accurately; classify errors.



Personally, I classify according to such criteria: the operation can be repeated (for example, we have a deadlock); the operation cannot be repeated, it has already been completed; the operation cannot be performed in principle.

Paradoxically, from the point of view of the application, the situation when a deadlock occurs, when the connection is lost and when we run out of money to pay, these situations are the same: the error handler will try again after some time to perform the operation.



On the other hand, what is written in the application is, in general, not my business: I work on the base. I only urge you to handle errors carefully, otherwise: idle in transaction, locked lines, puffing bases, and so on.

Most developers believe that they work with the base alone, and their application performs operations strictly consistently. And this is a plus for all relational DBMSs because, oddly enough, everything works, as a rule, very well, even with the standard isolation level of READ COMMITTED, and not SERIALIZABLE. At the same time, there are situations when updates are lost: one loads the form, another loads the same form, one wrote and saved, the other saved the old one - the changes were erased. The first one came to curse: “how is it that I have written so much, and all is lost.”



From my experience: once a week on Fridays, two managers made payments. They have to
were changed every other time, but, nevertheless, they once climbed at the same time and made two payments per person. If you have at least some possibility of a competitive access error, it will happen sooner or later. The question is when.

In addition, I draw your attention to the restrictions. I have repeatedly seen how uniqueness they tried to provide with triggers. Immediately triggers uniqueness in the table you do not provide. Either then you will need to block the entire table, or do some more complex gestures. You sooner or later stumble on this.



A couple of times I ran across a completely horrible thing: an external web service is being called from the database. There were some operations that change external entities. This is bad because the transaction can be rolled back in the database, but operations on the remote service will not be rolled out.

Even more subtle moment - deadlock-and. Let's imagine: we process a transaction, call an external web service, change something, then we have a deadlock, and we roll back, then we try to perform the operation again, call again, with good circumstances, deadlock again occurs roll back - so can
happen many times (I ran across a couple of hundred repetitions). And here you are processing these deadlock-and more or less correctly, repeat the operation and suddenly find out that within two months you are paying someone double the amount.



I met with payment services that had a poor API: “pay such a sum to such a user”; the function returns the result - paid / not paid. First, there is a problem in the case of repetition, and secondly, it is not clear what to do if the connection is interrupted. For some reason, very few people bother about this topic either.



In the slide example: such an operation should be performed in two stages: as if a warning - “we will do something now”; the operation itself.



If we suddenly stop - you never know, turn off the power - we can re-perform the operation. If we are dead in the second stage, then at least in the world, the second time we will not do it, and it can be disassembled manually. In fact, the overwhelming majority of such operations normally work out for the first time, but these measures are not theoretical speculations. Everything can work normally for months, and suddenly the admin begins to be wise with the network, the service starts to blink actively and problems start.


On the slide 4 types of non-cancellable operations. The latter is nonidempotent operations. This is a very sad case. At the beginning I spoke about a comrade who did everything on triggers precisely to ensure the idempotency of his operations.


At the conference, people will talk about Common Table Expressions, how it’s good. Unfortunately, CTEs in PostgreSQL are not free: they require work_mem for themselves. If you have a small sample, then, in general, do not worry. And if suddenly you have it big, then you start having problems. People very often use CTE as a sort of mini-view, so that you can somehow structure the application. CTE is very popular.





You can make a temporary view, but, unfortunately, each takes a line in pg_class, and if it is very actively used, then there may be problems with the swelling of the catalog.
In this case, it is advisable to make a parameterized view, or dynamically form a query, but unfortunately, inside PostgreSQL, this is not very cool from the inside.



About JSON is usually told in excellent tones, but there is a tendency in the application in JSON to push anything at all. In principle, everything works well. On the other hand, from JSON, data gets, though quickly, but not as quickly as from columns. Even worse, if you have a big JSON, and it was brought to TOAST. To get JSON from there, you need to raise it from TOAST.

If all the columns are in JSON, and even a functional index is built on them, you still need to get it from there. Even worse is obtained with a large volume, when the base is large, when you have a bitmap index scan . Then we have links not to lines, but to a whole page, and, in order to understand what to take from the page, PostgreSQL will make Recheck , that is, it raises the line from TOAST and checks whether this value is there or not, and accordingly skips or does not skip. If it works fine with small columns, this is a big problem with JSON. It is not necessary to get involved in JSON too much.



- How to check when several users work with a string? What options are there?

- First, you can read the values ​​of all columns before displaying the line in the form and make sure that they have not changed. The second option, more convenient: calculate the hash at all
columns, especially since the columns there can be large and thick. And the hash is not so big.

- You say that constraints should be called good names so that the user can understand what is happening. But there is a limit of 60 characters in the name constraint. This is often not enough. How to deal with it?

- I think to fight self-restraint. In PostgreSQL, this is a special type with a length of 64. In principle, you can recompile to a great length, but this is not very good.

- In the report, you are intrigued by the fact that we need to do something with the archives. What is the most accurate mechanism for placing obsolete data in the archive?

- As I said at the very beginning, with due diligence everything works. Which way is more convenient for you, so use it.


Timing: Part 2 of the report starts at 25:16

- There is a certain procedure that is called in parallel by several users. How to limit the parallel execution of this procedure, that is, to build all
users in a queue so that until one finishes the procedure, the next one can not start using it?

- That is the procedure? Or enough transaction?

- It is the procedure that is called in some transaction.

- You can put a lock on an object. There would be difficulties if you had a condition, say, no more than 3 at a time. But it is also realizable. I usually use transactional locks, but I can also use extra-transactional locks.

- I would still like to return to the archive data. You talked about
archive storage options so that data is also available from the application. It occurred to me to simply make a separate archive database. What other options are there?

- Yes, you can make an archive database. You can write a function and wrap it in a view. In the function, you can create everything that comes to your head: you can go to the archive database, you can pick up some files from the disk, you can go to an external web service, you can combine all this, you can generate some random data yourself - choice limited only by fantasy.

- To the question about the archival data: you can use partitions - new chips of the 11th version, when we make the whole table partized, and then just detail the partition and leave it as an archive. It can also be accessed.

- Of course, why not. I give the place to the next speaker.

Source: https://habr.com/ru/post/443792/


All Articles