The unexpected behavior of the application in relation to working with the database leads to a war between the DBA and the developers: DBA shout: "Your application drops the database", the developers - "But everything worked before!" Worst of all, DBA and developers cannot help each other: some do not know about the nuances of the application and driver, others do not know about the features related to the infrastructure. It would be nice to avoid such a situation.
You have to understand, it is often not enough to look through go-database-sql.org . It is better to arm yourself with other people's experience. Even better if it is an experience gained by blood and lost money.
My name is Ryabinkov Artemy and this article is a free interpretation of my report from the Saints HighLoad 2019 conference.
You can find the minimum necessary information on how to work with any SQL-like database in Go at go-database-sql.org . If you have not read it, read it.
In my opinion, the power of Go is simplicity. And this is expressed, for example, in that it is customary for Go to write queries in bare SQL (ORM is not in honor). This is both an advantage and a source of additional difficulties.
Therefore, taking the standard database/sql
language package, you will want to expand its interfaces. Once that happens, take a look at github.com/jmoiron/sqlx . Let me show you a few examples of how this extension can simplify your life.
Using StructScan eliminates the need to manually shift data from columns into structure properties.
type Place struct { Country string City sql.NullString TelephoneCode int `db:"telcode"` } var p Place err = rows.StructScan(&p)
Using NamedQuery allows you to use structure properties as placeholders in a query.
p := Place{Country: "South Africa"} sql := `.. WHERE country=:country` rows, err := db.NamedQuery(sql, p)
Using Get and Select allows you to get rid of the need to manually write loops that get rows from the database.
var p Place var pp []Place // Get p err = db.Get(&p, ".. LIMIT 1") // Select pp . err = db.Select(&pp, ".. WHERE telcode > ?", 50)
database/sql
is a set of interfaces for working with the database, and sqlx
is their extension. For these interfaces to work, they need an implementation. Drivers are responsible for implementation.
Most popular drivers:
pure Go Postgres driver for database/sql.
This driver has long remained the default standard. But today it has lost its relevance and is not being developed by the author.PostgreSQL driver and toolkit for Go.
Today it is better to choose this tool.github.com/jackc/pgx - this is the driver you want to use. Why?
database/sql
interfaces.PostgreSQL
implements outside the SQL
standard.pgx
human-readable errors , while just lib/pq
throws panic attacks. If you do not catch a panic, the program will crash. ( Do not use panic in Go; it's not the same as exceptions. )pgx
, we have the ability to independently configure each connection .PostgreSQL
logical replication protocol .Typically, we write this loop to get data from the database:
rows, err := s.db.QueryContext(ctx, sql) for rows.Next() { err = rows.Scan(...) }
Inside the driver, we get data by storing it in a 4KB buffer . rows.Next()
spawns a network trip and fills the buffer. If the buffer is not enough, then we go to the network for the remaining data. More network visits - less processing speed. On the other hand, since the buffer limit is 4KB, let's not forget the entire process memory.
But, of course, I want to unscrew the buffer volume to the maximum in order to reduce the number of requests to the network and reduce the latency of our service. We add this opportunity and try to find out the expected acceleration on synthetic tests :
$ go test -v -run=XXX -bench=. -benchmem goos: linux goarch: amd64 pkg: github.com/furdarius/pgxexperiments/bufsize BenchmarkBufferSize/4KB 5 315763978 ns/op 53112832 B/op 12967 allocs/op BenchmarkBufferSize/8KB 5 300140961 ns/op 53082521 B/op 6479 allocs/op BenchmarkBufferSize/16KB 5 298477972 ns/op 52910489 B/op 3229 allocs/op BenchmarkBufferSize/1MB 5 299602670 ns/op 52848230 B/op 50 allocs/op PASS ok github.com/furdarius/pgxexperiments/bufsize 10.964s
It can be seen that there is no big difference in processing speed. Why is that?
It turns out that we are limited by the size of the buffer for sending data inside Postgres itself. This buffer has a fixed size of 8KB . Using strace
you can see that the OS returns 8192
bytes in the read system call. And tcpdump
confirms this with the size of the packets.
Tom Lane ( one of the core developers of the Postgres kernel ) comments this like this:
Traditionally, at least, that was the size of pipe buffers in Unix machines, so in principle this is the most optimal chunk size for sending data across a Unix socket.
Andres Freund ( Postgres developer from EnterpriseDB ) believes that an 8KB buffer is not the best implementation option to date, and you need to test the behavior on different sizes and with a different socket configuration.
We must also remember that PgBouncer also has a buffer and its size can be configured with the pkt_buf
parameter.
Another feature of the pgx ( v3 ) driver: for each connection setup, it makes a request to the database to obtain information about the Object ID ( OID ).
These identifiers were added to Postgres to uniquely identify internal objects: rows, tables, functions, etc.
The driver uses knowledge of OIDs
to understand which database column into which language primitive to add data. To do this, pgx
supports such a table (the key is the type name, the value is Object ID )
map[string]Value{ "_aclitem": 2, "_bool": 3, "_int4": 4, "_int8": 55, ... }
This implementation leads to the fact that the driver for each established connection with the database makes about three requests to form a table with an Object ID
. In the normal mode of operation of the database and application, the connection pool in Go allows you not to generate new connections to the database. But at the slightest degradation of the database, the pool of connections on the application side is exhausted and the number of generated connections per unit of time increases significantly. Requests for OIDs
quite heavy, as a result, the driver can bring the database to a critical state.
Here is the moment when such requests were poured onto one of our databases:
15 transactions per minute in normal mode, a jump of up to 6500 transactions during degradation.
What to do?
First and foremost, limit the size of your pool from above.
For database/sql
this can be done with the DB.SetMaxOpenConns function. If you abandon the database/sql
interfaces and use pgx.ConnPool
(the connection pool implemented by the driver itself ), you can specify MaxConnections
( default is 5 ).
By the way, when using pgx.ConnPool
driver will reuse information about received OIDs
and will not make queries to the database for every new connection.
If you do not want to refuse database/sql
, you can cache information about OIDs
yourself.
github.com/jackc/pgx/stdlib.OpenDB(pgx.ConnConfig{ CustomConnInfo: func(c *pgx.Conn) (*pgtype.ConnInfo, error) { cachedOids = // OIDs . info := pgtype.NewConnInfo() info.InitializeDataTypes(cachedOids) return info, nil } })
This is a working method, but using it can be dangerous under two conditions:
Fulfillment of these conditions leads to the fact that cached OIDs
become invalid. But we won’t be able to clean them, because we don’t know the moment of switching to a new base.
In the Postgres
world, physical replication is used to organize high availability, which copies the instances of the database bit by bit, so problems with OIDs
caching OIDs
rarely seen in the wild. ( But it’s better to check with your DBAs how standby works for you ).
In the next major version of the pgx
driver - v4
, there will be no campaigns for OIDs
. Now the driver will rely only on the list of OIDs
that are OIDs
in the code. For custom types, you will need to take control of deserialization on your application side: the driver will simply give up a piece of memory as an array of bytes.
Monitoring and logging will help to notice problems before the base crashes.
database/sql
provides the DB.Stats () method. The returned status snapshot will give you an idea of what is going on inside the driver.
type DBStats struct { MaxOpenConnections int // Pool Status OpenConnections int InUse int Idle int // Counters WaitCount int64 WaitDuration time.Duration MaxIdleClosed int64 MaxLifetimeClosed int64 }
If you use the pool in pgx
directly, the ConnPool.Stat () method will give you similar information:
type ConnPoolStat struct { MaxConnections int CurrentConnections int AvailableConnections int }
Logging is equally important, and pgx
allows you to do this. The driver accepts the Logger
interface, implementing it, you get all the events occurring inside the driver.
type Logger interface { // Log a message at the given level with data key/value pairs. // data may be nil. Log(level LogLevel, msg string, data map[string]interface{}) }
Most likely, you do not even have to implement this interface yourself. In pgx
out of the box there is a set of adapters for the most popular loggers, for example, uber-go / zap , sirupsen / logrus , rs / zerolog .
Almost always when working with Postgres
you will use connection pooler , and it will be PgBouncer ( or odyssey - if you are Yandex ).
Why so, you can read in the excellent article brandur.org/postgres-connections . In short, when the number of clients exceeds 100, the speed of processing requests begins to degrade. This happens due to the features of the implementation of Postgres itself: the launch of a separate process for each connection, the mechanism for removing snapshots and the use of shared memory for interaction - all this affects.
Here is the benchmark of various connection pooler implementations:
And benchmark bandwidth with and without PgBouncer.
As a result, your infrastructure will look like this:
Where Server
is the process that processes user requests. This process spins in kubernetes
in 3 copies ( at least ). Separately, on an iron server, there is Postgres
, covered by PgBouncer'
. PgBouncer
itself PgBouncer
single-threaded, so we launch several bouncers, the traffic for which we balance using HAProxy
. As a result, we get this chain of query execution to the database: → HAProxy → PgBouncer → Postgres
.
PgBouncer
can work in three modes:
PgBouncer
takes this connection and gives it back to another transaction. This mode allows very good disposal of compounds.You can see the matrix of what properties are available in each mode. We choose Transaction Pooling , but it has limitations on working with Prepared Statements
.
Let's imagine that we want to prepare a request and then execute it. At some point, we start a transaction in which we send a request to Prepare, and we get the identifier of the prepared request from the database.
After, at any other moment in time, we generate another transaction. In it, we turn to the database and want to fulfill the request using the identifier with the specified parameters.
In Transaction Pooling mode, two transactions can be executed in different connections, but the Statement ID is valid only within one connection. We get a prepared statement does not exist
error when trying to execute a request.
The most unpleasant: since during development and testing the load is small, PgBouncer
often issues the same connection and everything works correctly. But as soon as we roll out to prod, requests begin to fall with an error.
Now find Prepared Statements
in this code:
sql := `select * from places where city = ?` rows, err := s.db.Query(sql, city)
You will not see him! Query preparation will implicitly occur inside Query()
. At the same time, the preparation and execution of the request will occur in different transactions and we will fully receive all that I described above.
What to do?
The first, easiest option is to switch PgBouncer
to Session pooling
. One connection is allocated to the session, all transactions begin to go in this connection and prepared requests work correctly. But in this mode, the efficiency of utilization of compounds leaves much to be desired. Therefore, this option is not considered.
The second option is to prepare a request on the client side . I do not want to do this for two reasons:
Another option is to explicitly wrap each request in a transaction . After all, as long as the transaction lives, PgBouncer
does not take the connection. This works, but, in addition to verbosity in our code, we also get more network calls: Begin, Prepare, Execute, Commit. Total 4 network calls per request. Latency is growing.
But I want it both safely and conveniently and efficiently. And there is such an option! You can explicitly tell the driver that you want to use Simple Query mode . In this mode, there will be no preparation and the entire request will pass in one network call. In this case, the driver will do the shielding of each of the parameters itself ( standard_conforming_strings
should be activated at the base level or when establishing a connection ).
cfg := pgx.ConnConfig{ ... RuntimeParams: map[string]string{ "standard_conforming_strings": "on", }, PreferSimpleProtocol: true, }
The following issues are related to canceling requests on the application side.
Take a look at this code. Where are the pitfalls?
rows, err := s.db.QueryContext(ctx, ...)
Go has a method for controlling the flow of program execution - context.Context . In this code, we pass the ctx
driver so that when the context is closed, the driver cancels the request at the database level.
At the same time, it is expected that we will save resources by canceling requests for which no one is waiting. But when canceling a request, PgBouncer
version 1.7 sends information to the connection that this connection is ready for use, and after that it returns it to the pool. This behavior of PgBouncer'
is misleading the driver, which, when sending the next request, instantly receives ReadyForQuery
in response. In the end, we catch unexpected ReadyForQuery errors .
Starting with PgBouncer
version 1.8, this behavior has been fixed . Use the current version of PgBouncer
.
But the most interesting is how query cancellation works. To cancel the request, we need to create a new connection to the database and request a cancellation. Postgres
creates a separate process for each connection. We send a command to cancel the current request in a specific process. To do this, create a new connection and in it pass the process ID (PID) of interest to us. But while the cancellation command flies to the base, the canceled request may end on its own.
Postgres
will execute the command and cancel the current request in the given process. But the current request will not be the one we wanted to cancel initially. Because of this behavior when working with Postgres
with PgBouncer
, it would be safer not to cancel the request at the driver level. To do this, you can set the CustomCancel
, which will not cancel the request, even if context.Context
used.
cfg := pgx.ConnConfig{ ... CustomCancel: func(_ *pgx.Conn) error { return nil }, }
Instead of conclusions, I decided to make a checklist for working with Postgres. This should help the article fit into my head.
OIDs
or use pgx.ConnPool if you are working with pgx
version 3.PgBouncer
transactional mode.PgBouncer
to the latest version.Source: https://habr.com/ru/post/461935/
All Articles