Postgres Tips for Rails Developers

In April, on RailsConf in Phoenix, we discussed a huge amount of advice on using Postgres with Rails, and thought it would be useful to write and share them with a wider audience. Here you will find some of them related to debugging and improving the database performance of your Rails application.

Manage long queries with timeouts

Long queries can have all kinds of negative effects on your database. Regardless of whether requests work for hours or just a few seconds, they can hold locks, overwhelm the WAL, or simply consume a huge amount of system resources. With Postgres, it’s easier to achieve greater stability by setting a timeout for requests. Conveniently, you can set a default value, for example, 5 seconds, as shown in the example below, and then any query that lasts longer than 5 seconds will be killed:

production: url: <%= DATABASE_URL %> variables: statement_timeout: 5000

If you want the request to run longer within the session, you can set a timeout that is valid only for the current connection:

 class MyAnalyticsJob < ActiveJob::Base queue_as :analytics def perform ActiveRecord::Base.connection.execute “SET statement_timeout = 600000” # 10  # ... ensure ActiveRecord::Base.connection.execute “SET statement_timeout = 5000” # 5  end end

Search for “bad” queries

Rails abstracts a lot when interacting with your database. This can be both good and bad. Postgres itself allows you to track long requests, but as your Rails application grows, this may not be enough. To find out the origin of the request, there is an extremely convenient gem marginalia that will log exactly where your request came from. Now, when you see the wrong request, too slow, or which can be simply removed, you know exactly where to fix this in the code:

 Account Load (0.3ms) SELECT `accounts`.* FROM `accounts` WHERE `accounts`.`queenbee_id` = 1234567890 LIMIT 1 /*application:BCX,controller:project_imports,action:show*/

Attention to the comment - comment.

High-level overview of the situation with database queries

Often, a general picture of what is happening in your database is required. pg_stat_statements is a Postgres extension that is often pre-installed in cloud environments such as Citus Cloud. It allows you to see which queries have been executed since the last reset of the statistics and how they behaved.

For example, to see the 10 longest running queries and their average time, do the following:

 SELECT query, total_time / calls AS avg_time FROM pg_stat_statements ORDER BY total_time DESC LIMIT 10;

If you enable "track_io_timing" in the Postgres statistics collector , you can understand what the bottleneck is - the processor or I / O. You can learn more about pg_stat_statements here (or in another article okmeter.io on Habre - comment. Per.) .

Using advanced features

By default, Rails uses the “schema.rb” file to store a copy of the database schema, usually used to initialize the database before running the tests. Unfortunately, many of the advanced features of Postgres, such as functional and partial indexes, as well as composite primary keys, cannot be represented in this DSL.

Instead, it makes sense to switch to the “db / structure.sql” file generated and used by Rails, which can be done as follows:

 # Use SQL instead of Active Record's schema dumper when creating the database. # This is necessary if your schema can't be completely dumped by the schema dumper, # like if you have constraints or database-specific column types config.active_record.schema_format = :sql

Inside, the Postgres “pg_dump” format is used, which can sometimes be too detailed, but it guarantees getting a fully restored database structure. If you encounter the problem of excessively long diffs, you can take a look at the activerecord-clean-db-structure .

Ensure that complex transactions do not block each other.

Rails likes to put everything in a transaction, especially when using “before_save” hooks and multi-level relationships between models. There is one important caution that should be considered when transactions that may cause problems when scaling. For example, in such a transaction:

 BEGIN; UPDATE organizations SET updated_at = '2017-04-27 11:31:03 -0700' WHERE id = 123; SELECT * FROM products WHERE store_id = 456; ---   statement'  COMMIT;

The first UPDATE statement will hold the lock on the "organizations" line with id "123" from the very beginning to the transaction COMMIT.

Submit another request to the same "organization", which came, for example, from another user, and performs a similar transaction. As a rule, to execute, this other request will have to wait for the commit of the first transaction, which will increase the response time. To fix this, you can reorder the transaction so that the UPDATE is executed near the end, and also consider the possibility of grouping updates to the timestamp fields outside the transaction after the main work has been completed.

To detect similar problems in advance, you can set log_lock_waits = on in PostgreSQL.

Monitor database connections

Rails by default supports database connection pooling. When a new request arrives, it takes one connection from the pool and passes it to the application. As your Rails application scales up, this can lead to hundreds of open connections to the database, although in fact only some of them do the work. The key to this is using the connection manager to reduce active connections to the database, such as pgBouncer , for example. The connection manager will open connections when transactions are active, and do not skip idle connections that do not perform any work.

From the translator - and you can also use our monitoring , which will track the number of connections to Postgres and to pgBouncer and much more, and will help prevent and resolve problem situations.

Source: https://habr.com/ru/post/330628/

All Articles