How to work with Postgres in Go: practices, features, nuances

The unexpected behavior of the application in relation to working with the database leads to a war between the DBA and the developers: DBA shout: "Your application drops the database", the developers - "But everything worked before!" Worst of all, DBA and developers cannot help each other: some do not know about the nuances of the application and driver, others do not know about the features related to the infrastructure. It would be nice to avoid such a situation.

You have to understand, it is often not enough to look through go-database-sql.org . It is better to arm yourself with other people's experience. Even better if it is an experience gained by blood and lost money.

My name is Ryabinkov Artemy and this article is a free interpretation of my report from the Saints HighLoad 2019 conference.

Instruments

You can find the minimum necessary information on how to work with any SQL-like database in Go at go-database-sql.org . If you have not read it, read it.

sqlx

In my opinion, the power of Go is simplicity. And this is expressed, for example, in that it is customary for Go to write queries in bare SQL (ORM is not in honor). This is both an advantage and a source of additional difficulties.

Therefore, taking the standard database/sql language package, you will want to expand its interfaces. Once that happens, take a look at github.com/jmoiron/sqlx . Let me show you a few examples of how this extension can simplify your life.

Using StructScan eliminates the need to manually shift data from columns into structure properties.

 type Place struct { Country string City sql.NullString TelephoneCode int `db:"telcode"` } var p Place err = rows.StructScan(&p)

Using NamedQuery allows you to use structure properties as placeholders in a query.

 p := Place{Country: "South Africa"} sql := `.. WHERE country=:country` rows, err := db.NamedQuery(sql, p)

Using Get and Select allows you to get rid of the need to manually write loops that get rows from the database.

 var p Place var pp []Place // Get   p     err = db.Get(&p, ".. LIMIT 1") // Select   pp   . err = db.Select(&pp, ".. WHERE telcode > ?", 50)

Drivers

database/sql is a set of interfaces for working with the database, and sqlx is their extension. For these interfaces to work, they need an implementation. Drivers are responsible for implementation.

4KB

Typically, we write this loop to get data from the database:

 rows, err := s.db.QueryContext(ctx, sql) for rows.Next() { err = rows.Scan(...) }

Inside the driver, we get data by storing it in a 4KB buffer . rows.Next() spawns a network trip and fills the buffer. If the buffer is not enough, then we go to the network for the remaining data. More network visits - less processing speed. On the other hand, since the buffer limit is 4KB, let's not forget the entire process memory.

But, of course, I want to unscrew the buffer volume to the maximum in order to reduce the number of requests to the network and reduce the latency of our service. We add this opportunity and try to find out the expected acceleration on synthetic tests :

 $ go test -v -run=XXX -bench=. -benchmem goos: linux goarch: amd64 pkg: github.com/furdarius/pgxexperiments/bufsize BenchmarkBufferSize/4KB 5 315763978 ns/op 53112832 B/op 12967 allocs/op BenchmarkBufferSize/8KB 5 300140961 ns/op 53082521 B/op 6479 allocs/op BenchmarkBufferSize/16KB 5 298477972 ns/op 52910489 B/op 3229 allocs/op BenchmarkBufferSize/1MB 5 299602670 ns/op 52848230 B/op 50 allocs/op PASS ok github.com/furdarius/pgxexperiments/bufsize 10.964s

It can be seen that there is no big difference in processing speed. Why is that?

It turns out that we are limited by the size of the buffer for sending data inside Postgres itself. This buffer has a fixed size of 8KB . Using strace you can see that the OS returns 8192 bytes in the read system call. And tcpdump confirms this with the size of the packets.

Tom Lane ( one of the core developers of the Postgres kernel ) comments this like this:

Traditionally, at least, that was the size of pipe buffers in Unix machines, so in principle this is the most optimal chunk size for sending data across a Unix socket.

Andres Freund ( Postgres developer from EnterpriseDB ) believes that an 8KB buffer is not the best implementation option to date, and you need to test the behavior on different sizes and with a different socket configuration.

We must also remember that PgBouncer also has a buffer and its size can be configured with the pkt_buf parameter.

OIDs

Another feature of the pgx ( v3 ) driver: for each connection setup, it makes a request to the database to obtain information about the Object ID ( OID ).

These identifiers were added to Postgres to uniquely identify internal objects: rows, tables, functions, etc.

The driver uses knowledge of OIDs to understand which database column into which language primitive to add data. To do this, pgx supports such a table (the key is the type name, the value is Object ID )

 map[string]Value{ "_aclitem": 2, "_bool": 3, "_int4": 4, "_int8": 55, ... }

This implementation leads to the fact that the driver for each established connection with the database makes about three requests to form a table with an Object ID . In the normal mode of operation of the database and application, the connection pool in Go allows you not to generate new connections to the database. But at the slightest degradation of the database, the pool of connections on the application side is exhausted and the number of generated connections per unit of time increases significantly. Requests for OIDs quite heavy, as a result, the driver can bring the database to a critical state.

Here is the moment when such requests were poured onto one of our databases:

15 transactions per minute in normal mode, a jump of up to 6500 transactions during degradation.

What to do?

First and foremost, limit the size of your pool from above.

For database/sql this can be done with the DB.SetMaxOpenConns function. If you abandon the database/sql interfaces and use pgx.ConnPool (the connection pool implemented by the driver itself ), you can specify MaxConnections ( default is 5 ).

By the way, when using pgx.ConnPool driver will reuse information about received OIDs and will not make queries to the database for every new connection.

If you do not want to refuse database/sql , you can cache information about OIDs yourself.

 github.com/jackc/pgx/stdlib.OpenDB(pgx.ConnConfig{ CustomConnInfo: func(c *pgx.Conn) (*pgtype.ConnInfo, error) { cachedOids = //  OIDs   . info := pgtype.NewConnInfo() info.InitializeDataTypes(cachedOids) return info, nil } })

This is a working method, but using it can be dangerous under two conditions:

you use enum or domain types in Postgres;
if the wizard fails, you switch the application to the replica, which is poured by logical replication.

Fulfillment of these conditions leads to the fact that cached OIDs become invalid. But we won’t be able to clean them, because we don’t know the moment of switching to a new base.

In the Postgres world, physical replication is used to organize high availability, which copies the instances of the database bit by bit, so problems with OIDs caching OIDs rarely seen in the wild. ( But it’s better to check with your DBAs how standby works for you ).

In the next major version of the pgx driver - v4 , there will be no campaigns for OIDs . Now the driver will rely only on the list of OIDs that are OIDs in the code. For custom types, you will need to take control of deserialization on your application side: the driver will simply give up a piece of memory as an array of bytes.

Logging and monitoring

Monitoring and logging will help to notice problems before the base crashes.

database/sql provides the DB.Stats () method. The returned status snapshot will give you an idea of what is going on inside the driver.

 type DBStats struct { MaxOpenConnections int // Pool Status OpenConnections int InUse int Idle int // Counters WaitCount int64 WaitDuration time.Duration MaxIdleClosed int64 MaxLifetimeClosed int64 }

If you use the pool in pgx directly, the ConnPool.Stat () method will give you similar information:

 type ConnPoolStat struct { MaxConnections int CurrentConnections int AvailableConnections int }

Logging is equally important, and pgx allows you to do this. The driver accepts the Logger interface, implementing it, you get all the events occurring inside the driver.

 type Logger interface { // Log a message at the given level with data key/value pairs. // data may be nil. Log(level LogLevel, msg string, data map[string]interface{}) }

Most likely, you do not even have to implement this interface yourself. In pgx out of the box there is a set of adapters for the most popular loggers, for example, uber-go / zap , sirupsen / logrus , rs / zerolog .

Infrastructure

Almost always when working with Postgres you will use connection pooler , and it will be PgBouncer ( or odyssey - if you are Yandex ).

Why so, you can read in the excellent article brandur.org/postgres-connections . In short, when the number of clients exceeds 100, the speed of processing requests begins to degrade. This happens due to the features of the implementation of Postgres itself: the launch of a separate process for each connection, the mechanism for removing snapshots and the use of shared memory for interaction - all this affects.

Here is the benchmark of various connection pooler implementations:

And benchmark bandwidth with and without PgBouncer.

As a result, your infrastructure will look like this:

Where Server is the process that processes user requests. This process spins in kubernetes in 3 copies ( at least ). Separately, on an iron server, there is Postgres , covered by PgBouncer' . PgBouncer itself PgBouncer single-threaded, so we launch several bouncers, the traffic for which we balance using HAProxy . As a result, we get this chain of query execution to the database: → HAProxy → PgBouncer → Postgres .

PgBouncer can work in three modes:

Session pooling - for each session, one connection is issued and assigned to it for the entire life time.
Transaction pooling - the connection lives while the transaction is running. As soon as the transaction is completed, PgBouncer takes this connection and gives it back to another transaction. This mode allows very good disposal of compounds.
Statement pooling - deprecated mode. It was created only to support PL / Proxy .

You can see the matrix of what properties are available in each mode. We choose Transaction Pooling , but it has limitations on working with Prepared Statements .

Transaction Pooling + Prepared Statements

Let's imagine that we want to prepare a request and then execute it. At some point, we start a transaction in which we send a request to Prepare, and we get the identifier of the prepared request from the database.

After, at any other moment in time, we generate another transaction. In it, we turn to the database and want to fulfill the request using the identifier with the specified parameters.

In Transaction Pooling mode, two transactions can be executed in different connections, but the Statement ID is valid only within one connection. We get a prepared statement does not exist error when trying to execute a request.

The most unpleasant: since during development and testing the load is small, PgBouncer often issues the same connection and everything works correctly. But as soon as we roll out to prod, requests begin to fall with an error.

Now find Prepared Statements in this code:

 sql := `select * from places where city = ?` rows, err := s.db.Query(sql, city)

You will not see him! Query preparation will implicitly occur inside Query() . At the same time, the preparation and execution of the request will occur in different transactions and we will fully receive all that I described above.

What to do?

The first, easiest option is to switch PgBouncer to Session pooling . One connection is allocated to the session, all transactions begin to go in this connection and prepared requests work correctly. But in this mode, the efficiency of utilization of compounds leaves much to be desired. Therefore, this option is not considered.

The second option is to prepare a request on the client side . I do not want to do this for two reasons:

Potential SQL vulnerabilities. The developer may forget or incorrectly do the escaping.
Escaping the query parameters each time has to be written by hand.

Another option is to explicitly wrap each request in a transaction . After all, as long as the transaction lives, PgBouncer does not take the connection. This works, but, in addition to verbosity in our code, we also get more network calls: Begin, Prepare, Execute, Commit. Total 4 network calls per request. Latency is growing.

But I want it both safely and conveniently and efficiently. And there is such an option! You can explicitly tell the driver that you want to use Simple Query mode . In this mode, there will be no preparation and the entire request will pass in one network call. In this case, the driver will do the shielding of each of the parameters itself ( standard_conforming_strings should be activated at the base level or when establishing a connection ).

 cfg := pgx.ConnConfig{ ... RuntimeParams: map[string]string{ "standard_conforming_strings": "on", }, PreferSimpleProtocol: true, }

Cancel requests

The following issues are related to canceling requests on the application side.

Take a look at this code. Where are the pitfalls?

 rows, err := s.db.QueryContext(ctx, ...)

Go has a method for controlling the flow of program execution - context.Context . In this code, we pass the ctx driver so that when the context is closed, the driver cancels the request at the database level.

At the same time, it is expected that we will save resources by canceling requests for which no one is waiting. But when canceling a request, PgBouncer version 1.7 sends information to the connection that this connection is ready for use, and after that it returns it to the pool. This behavior of PgBouncer' is misleading the driver, which, when sending the next request, instantly receives ReadyForQuery in response. In the end, we catch unexpected ReadyForQuery errors .

Starting with PgBouncer version 1.8, this behavior has been fixed . Use the current version of PgBouncer .

Delayed cancellation

But the most interesting is how query cancellation works. To cancel the request, we need to create a new connection to the database and request a cancellation. Postgres creates a separate process for each connection. We send a command to cancel the current request in a specific process. To do this, create a new connection and in it pass the process ID (PID) of interest to us. But while the cancellation command flies to the base, the canceled request may end on its own.

Postgres will execute the command and cancel the current request in the given process. But the current request will not be the one we wanted to cancel initially. Because of this behavior when working with Postgres with PgBouncer , it would be safer not to cancel the request at the driver level. To do this, you can set the CustomCancel , which will not cancel the request, even if context.Context used.

 cfg := pgx.ConnConfig{ ... CustomCancel: func(_ *pgx.Conn) error { return nil }, }

Postgres Checklist

Instead of conclusions, I decided to make a checklist for working with Postgres. This should help the article fit into my head.

Use github.com/jackc/pgx as a driver for working with Postgres.
Limit the size of the connection pool from above.
Cache OIDs or use pgx.ConnPool if you are working with pgx version 3.
Collect metrics on the connection pool using DB.Stats () or ConnPool.Stat () .
Log what is happening in the driver.
Use Simple Query mode to avoid problems with query preparation in PgBouncer transactional mode.
Update PgBouncer to the latest version.
Be careful with canceling requests from the application.

Source: https://habr.com/ru/post/461935/

All Articles