After reading the article
Databases in online games and especially comments to it, I was once again upset by the idea that many developers change the database in their project, trying to solve their problems with this shift, without exhausting all the possibilities offered by the replaced database. I take part in the work on the project, the database of which is characterized by:
- The number of transactions of the order of 5'000 - 10'000 per second
- Approximately 100GB in capacity (which grows vigorously)
- Approximately equal number of read / write operations
- Mostly small transactions
For historical reasons, as the saying goes, the whole base is, ahem, a single base, without a sharding, which could solve a number of problems. The more acute is the question of finding solutions that lie precisely in the features of PostgreSQL. Solutions that can be implemented with little blood.
I will describe some of the problems we face and the solutions applied. If someone finds it useful - for God's sake. If someone corrects, I will only be happy to discover the vulnerability in the approaches used and to increase the efficiency of the database. The described can be applied without interfering with the application logic, solely by modifying the database schema.
We will talk about PostgreSQL 9.2 (9.3 have not yet felt). And, yes, there will be no admin tips for optimal configuration of configs - I would be glad to know how to sharpen them correctly.
Captain recommendations like “create the right indexes” will also be missing. I assume that these nanotechnologies lie on the surface.
')
Large table and index sizes
It is not a secret that the more intensively the data is written to the database, the faster the speed of sampling decreases on average and the base begins to “stupid”. This is directly related to the transactional essence of the database. Since strings (tuples) have versioning, when a value changes in a certain string, this value cannot simply be taken and written instead of the previous one. PostgreSQL instead duplicates the string and writes its new, modified version separately. That is, roughly, if you take a table in 1GB and make UPDATE any field a single operation, then the table “suddenly” will increase in size to 2GB. The same thing happens with indexes. Increasing the size of the database immediately affects the speed of query execution.
Of course, AUTOVACUUM will work regularly (if configured), well, or you can run VACUUM as needed.
What can be done to mitigate the problem?
- Use fill factor. This parameter determines the degree of fullness of the pages (data blocks) of the tables on the disk and how much free space should be left in the blocks for writing modified versions of tuples there. Our experience (updates, mostly one by one, but many) shows that to get a noticeable positive effect, a value of 90% is enough (by default it is 100%, that is, the table is packed as closely as possible). This means that the table will obviously be at least about 10% larger, but when the lines are updated, their new version can be written not anyhow where, but to the block in which it was faster and more compact. The documentation mentions that for tables in which the records never change (or extremely rarely), there is no point in changing the fill factor, which is correct. By the way, for the fill factor indices, the default is already 90.
- Use partial indexes. Often, it may turn out that you are only interested in records with records with a specific value of a boolean field or a specific enumeration value (enum). For example, in the database you organized storage of tasks for performing background tasks (which, however, is not the best idea, but for other reasons) and most often choose tasks with “new” status from there. So why index the entire table, if in fact the sample is made from a negligible number of records (obviously, new tasks will be much smaller than those already completed)? Setting restrictions on the index will reduce its size at times, if not by orders of magnitude.
- Use functional indexes (functional indexes). Functional indexes are a powerful tool that can generally avoid the creation of some columns in a table. For example, there is a timestamp field in the table with a date and time. And you need to often do the sample, suppose, with a sample of data on the clock. You can search directly in the timestamp column (and hang an index on it). You can create a separate column, where, by trigger or by hand, save the hour value separately. And you can create a functional index that will contain only the values ​​of the clock and be based on the existing timestamp column. As a result, no extra columns were created, and the index is efficient.
Slow samples
- The two previous points do not lose their relevance - the use of partial and functional indexes.
- Clustering ( CLUSTER ). Useful topic, if you know the typical options for accessing their objects. In the process of clustering tables at the specified index, PostgreSQL distributes the data tables on the hard disk in an orderly manner, in accordance with the index. For example, there is a table with a list of apartments tied to houses. You can create an index on the “house id” field and clustering on it, as a result of the apartment record, they will be physically grouped into houses and, if you are asked to “return me all the apartments of the house №777”, the base will no longer need to collect apartments throughout the table. They will be stored compactly. True, clustering should be repeated regularly. In the AUTOVACUUM process, it is not performed.
Transaction Counter Exceed
The operation of the MVCC mechanisms would be impossible without the existence of a transaction counter. The mystery is why this 32-bit counter is still, but we have what we have - every 2 billion with a tail of transactions, the counter is supposed to be reset. And in order to avoid the irreparable, by this point the transaction numbers of all rows in the database should be reset. More precisely, some reserved FrozenXID is assigned to these lines. Tellingly, AUTOVACUUM and manual VACUUM are not engaged in FrozenXID installation and for normal operation of the database, when the transaction counter reaches a certain config-value, an auto-vacuum is started with a beautiful comment “to prevent wraparound”. With large tables (tens of GB), this process can take weary hours, during which the table will be unavailable for either reading or writing.
It is not possible (to me) to do something here without disturbing the logic and architecture of the application. It is necessary either to reduce the number of transactions, enlarging them (which may result in other problems and generally ineffectively), or break the database into pieces, which also does not pull the operational way of solving the problem.
Long transactions
This is fairly obvious, but just in case I will mention.
We are not talking about long transactions per se, but about situations when the application did not close the transaction because of the bug and left to do its business further (for example, background task handlers capable of hanging for days). In the list of active PostgreSQL processes, such a connection will hang with the status “idle in transaction”. Regularly working out VACUUM frees up the space occupied by the old versions of the rows (remaining after editing or deleting), referring to the transaction number in which the deletion or editing was performed. Roughly speaking, if there were rows deleted by transactions 1 and 2, and the minimum number among the current open transactions is 3, then these two lines can be deleted “for real”, because they do not fall within the scope of any current open transaction. In the case of a pending transaction, VACUUM will not be able to delete a single row that has been edited after the start of this transaction, as a result of which the base can grow horribly in a few hours.
There is only one advice - to monitor current processes and catch villains with "idle in transaction".
Durable blocking backups
Replicas, for all their attractiveness, do not save from situations where the base was beaten, for example, by a curved patch. They made a patch for some sorting of objects there, all the good replicated - and hello. Therefore, backups, anyway, are needed. We have a large base backup for about an hour. Practically all this time the project is inactive. Here, frankly, I can not advise anything. On the contrary, I will be glad to hear who and how to cope with the situation. The only thought is to beat on shards and backup with chunks, thinking about how to preserve the consistency of the data.
In version 9.3, noticeable work was done on representations that can also allow optimizing work with the database, but this is a separate topic.