PostgreSQL based application performance: explicit and hidden latency

If you are trying to optimize the performance of your PostgreSQL-based application, you probably use the basic tools: EXPLAIN (BUFFERS, ANALYZE), pg_stat_statements, auto_explain, log_statement_min_duration, etc.

Maybe you are looking in the direction of lock conflicts with log_lock_waits , monitor the behavior of your control points, etc.

But have you ever thought about network delays? Players know about it, but is it related to your server with the application?
')

Delay affects

Typical client / server latency delays can range from 0.01 ms (local) to ~ 0.5 ms in a switched network, 5 ms over WiFi, 20 ms with ADSL, 300 ms with intercontinental routing, and even more for things like satellites and WWAN links.

Trivial SELECT can take about 0.1 ms to execute on the server side. Trivial INSERT 0.5 ms.

Each time your application executes a request, it has to wait for a response from the server with success / error and possibly result, request metadata, etc. This entails at least one round trip delay on the network.

When working primarily with small, simple requests, the network latency can be significant for their execution time if the database and application are on different servers.

Most applications, especially ORMs, are very prone to performing a large number of fairly simple queries. For example, if your Hibernate application gets an entity through the lazily extracted @OneToMany relationship to 1000 children, it probably plans to execute 1001 queries due to the n + 1 sampling problem, if not more. This means that it will spend a thousand times the delay time of your network to go back and forth just to wait. You can use left join fetch to avoid this ... but then you will pass 1000 times the parent element in the JOIN and have to dedup it.

Likewise, if you populate the database through ORM, you probably do hundreds of thousands of regular INSERTs ... and wait after each separate INSERT for confirmation from the server that everything is in order.

It's easy enough to focus on the query execution time and try to optimize it, but there are a large number of options for what can be done with the simplest INSERT INTO ... VALUES .... Remove some indexes and restrictions, make sure that it is in a transaction, and everything is almost ready.

But what about getting rid of all network expectations? Even in the local network, they begin to grow after thousands of requests.

COPY

One way to avoid delays is to use COPY . In order to use PostgreSQl COPY support, your application or driver must produce a CSV-like set of strings and broadcast them to the server in a continuous sequence. Or, the server may be asked to send a CSV-like stream to your application.

In any case, the application cannot alternate COPY with other requests, and copy-paste must be loaded directly into the target table. The general approach is to make COPY into a temporary table, and directly from it to do INSERT INTO ... SELECT ..., UPDATE ... FROM ...., DELETE FROM ... USING ... , etc., using copied data to change the main tables in a single operation.

This is convenient if you write your own SQL directly, but frameworks and ORMs do not support this, plus it can only directly replace a simple insert. Your application, framework, or user driver has to deal with the special presentation necessary for COPY, look for the necessary metadata, etc.

(Well-known drivers that actually support COPY include: libpq, PgJDBC, psycopg2, and Pg gem ... but it’s not at all necessary that the frameworks and ORMS based on them also support COPY.)

PgJDBC - grouping mode

PostgreSQL's JDBC driver has a solution for this problem. It is based on the peculiarities of grouping the JDBC driver API: it sends a group of requests to the server, and then waits for only one confirmation that the entire set was successful.

At least in theory. In fact, some implementation problems limit this all to a few hundred requests in a batch at a time at best. A driver can also only perform queries that return a result as a set, if it can determine how large the result will be over time. Despite these limitations, using Statement.executeBatch () can offer a very large performance boost to applications that perform tasks such as bulk loading data from remote database instances.

Since this is a standard API, it can be used by applications that run on multiple DBMSs. Hibernate, for example, can use JDBC grouping, although it does not do this by default.

libpq and grouping

Most (all?) Of other PostgreSQL drivers do not have grouping support. PgJDBC uses the PostgreSQL protocol completely and independently, while most other drivers use the libpq C library, which is supplied as part of PostgreSQL.

libpq has an asynchronous non-blocking API, but the user can still only have one running query at a time. You must wait for the results of this query before sending the next one.

The PostgreSQL server supports grouping just fine, and PgJDBC is already using it. In this regard, I wrote grouping support for libpq and offered it to the next version of PostgreSQL . Since it only changes the client, if confirmed, it will still speed up the process when connecting to old servers.

I would be very interested in feedback from authors and advanced users of libpq-based user drivers and developers of libpq-based applications. The patch is successfully applied to PostgreSQL version 9.6 beta 1 if you want to try it. Detailed documentation and a comprehensive program example are available .

Performance

I think that a database server based on RDS or Heroku Postgres would be a good example of where the described functionality can be useful. In particular, accessing them from outside their own networks is a good example of how delay can damage.

At ~ 320 ms network latency:

500 insert without grouping: 167.0 sec
500 insert with grouping: 1.2 sec

... which is about 120 times faster.

As a rule, you do not use an intercontinental connection between the server on which the application resides and the server on which the database is deployed, the same servers are used to illustrate the effects of the delay. Even through a Unix socket on the local host, I saw a performance gain of 50% for 10,000 insert.

Grouping in existing applications

Unfortunately, you cannot automatically enable grouping for existing applications. They should use a slightly different interface where they send a series of requests and only then request the result.

It should be fairly easy to adapt applications that already use the asynchronous libpq interface, especially if they use the non-blocking mode and the select () / poll () / epoll () / WaitForMultipleObjectsEx loop. Applications that use the synchronous libpq interface will require more changes.

Grouping in other custom drivers

Similarly, user drivers, frameworks, and ORMs usually require an interface and internal changes to allow the use of grouping. If they already use event looping and non-blocking I / O, they should be easy enough to change.

I would be glad to see Python, Ruby and other users having the opportunity to use this functionality, so it is interesting for me to see who is interested. Imagine being able to do this:

import psycopg2 conn = psycopg2.connect(...) cur = conn.cursor() # this is just an idea, this code does not work with psycopg2: futures = [ cur.async_execute(sql) for sql in my_queries ] for future in futures: result = future.result # waits if result not ready yet ... process the result ... conn.commit()

Performing asynchronous grouping should not be difficult on the part of the user.

COPY is the fastest

Practical customers still prefer COPY. Here are some results from my laptop:

 inserting 1000000 rows batched, unbatched and with COPY batch insert elapsed: 23.715315s sequential insert elapsed: 36.150162s COPY elapsed: 1.743593s Done.

Grouping work generates a surprisingly large performance boost even on a local unix socket ... but COPY leaves both individual INSERT approaches far behind in the dust.

Use COPY.

Picture

The image for this article is the Goldfields Water Supply Scheme pipeline from the Mundaring Weir near Perth in Western Australia to the internal (desert) gold mines. It fits this article because, due to the duration of its construction and under a barrage of criticism, its creator and main initiator, CY O'Connor , committed suicide 12 months before putting the pipeline into operation. Locals often (erroneously) say that he died after the pipeline was open, but the water did not flow — it took so much time that everyone thought the pipeline project was a failure. But, a week later, the water went.

Source: https://habr.com/ru/post/304320/

All Articles