Optimization of the speed of backups using the file system (read ahead, read ahead)

This article is addressed to engineers and consultants working with productivity operations related to the sequential reading of files. Basically, these are of course backups. Here you can also enable reading large files from file storages, some database operations, for example, full table scans (depending on the placement of data).

Examples are given for the VxFS file system (Symantec). This file system is widely used in server systems and is supported on HP-UX, AIX, Linux, Solaris.

Why do you need it?

The question is how to get the maximum speed while sequentially reading data into one stream (!) From a large file (backup of a large number of small files beyond the scope of this article). We consider such sequential reading when data blocks from physical disks are requested one by one, in order. We believe that file system fragmentation is absent. This is justified, since if there are some large files on the file system and they are rarely re-created, then they are practically not fragmented. This is a common situation for databases such as Oracle. Reading from a file in this case differs little from reading from a raw device.

What is the speed of single-threaded reading?

The fastest of modern disks (15K rpm) have an access time (service time) of about 5.5 ms (for queuing theory fans, we assume the wait time is 0).
Determine the number of I / O operations that the process can perform (backup):
')
1 / 0.0055 = 182 IO per second (iops).

If the process sequentially performs operations, each of which lasts 5.5 ms, it will execute 182 pieces in a second. Suppose the block size is 256KB. Thus, the maximum throughput of this process will be: 182 * 256 = 46545 KB / s. (46 MB / s). Modest, right? It looks especially modest for systems with hundreds of physical spindles when we count on a much higher reading speed. The question arises how to optimize it. Reduce the time to access the disk can not be, because it is technological limitations. Parallelizing backup is also not always possible. To remove this restriction on file systems, a read ahead mechanism is implemented.

How does read ahead work

In modern * nix systems, there are two types of I / O requests: synchronous and asynchronous. With a synchronous request, the process is blocked until a response is received from the disk subsystem. With asynchronous, it is not blocked and can do anything else. When reading sequentially, we read the data synchronously. When the read-ahead mechanism is enabled, the file system code, immediately after a synchronous request, makes some more asynchronous. Suppose the process requested block number 1000. With read ahead turned on, in addition to block 1000, 1001,1002,1003,1004 will also be requested. Thus, when requesting the block 1001, we do not need to wait 5.5 ms. With the help of the read ahead setting, you can significantly (at times) increase the speed of sequential reading.

How is it configured?

The key to setting up a read ahead is its size. Looking ahead, I’ll say that there are two major problems with read ahead: insufficient read ahead and excessive. So, on VxFS, read ahead is configured using the “read_pref_io” and “read_nstream” parameters of the vxtunefs command. When forward reading is enabled on the VxFS, 4 blocks of read_pref_io are initially requested. If the process continues to read sequentially, then 4 * read_pref_io * read_nstream is read.

Example

:
Let read_pref_io = 256k and read_nstream = 4

Thus, the initial read ahead will be: 4 * 256KB = 1024KB.
If the sequential reading continues, then: 4 * 4 * 256KB = 4096KB

It should be noted that in the latter case, 16 requests with a 256KB block will be sent to the disk subsystem almost simultaneously. This is not small and can load the array well for a short time. In general, it is difficult to give some general advice in setting up read_pref_io and read_nstream. Specific solutions always depend on the number of disks in the array and the nature of the load. For some loads, read_pref_io = 256k and read_nstream = 32 work fine (a lot). Sometimes, read_ahead is better off completely. Since the setting is simple and it is set on the fly, the easiest way is to select the optimal value. The only thing that can be advised is to always put read_pref_io in powers of 2. Or at least so that they are multiples of the size of the data block in the OS cache. Otherwise, the consequences may be unpredictable.

OS buffer cache effect

When read ahead reads data asynchronously, it must be stored somewhere in memory. To do this, use the file cache of the operating system. In some cases, the file system can be mounted with the file cache disabled (direct IO). Accordingly, the read ahead functionality is disabled in this case.

The main problems with advanced reading:

1) Insufficient read ahead. The size of the block that requested the application is larger than the block read through read ahead. For example, the command 'cp' can read 1024 KB block, and the read ahead is configured to read 256KB. That is, there is simply not enough data to satisfy the application and another synchronous I / O request is needed. In this case, turning on read ahead will not bring an increase in speed.

2) Excessive read ahead
- too aggressive read ahead can simply overload the disk subsystem. Especially if there are few spindles in the backend. A large number of almost parallel requests dropped from the host can flood the disk array. In this case, instead of acceleration, you will see slowdowns in work.
- another problem with read ahead may be misses when the file system erroneously determines the sequential reading, reads unnecessary data in the cache. This leads to spurious I / O operations, and creates additional load on the disks.
- since the read ahead data is stored in the cache of the file system, a large amount of read ahead can lead to more valuable blocks being flushed out of the cache. These blocks then have to read from the disk again.

3) Conflict between file system read ahead and disk array read ahead
Fortunately, this is an extremely rare case. Most modern disk arrays, equipped with cache memory and logic, implement their own read ahead mechanism at the hardware level. The logic of the array itself determines the sequential read and the controller wholesale reads data from physical disks to the cache of the array. This can significantly reduce the response time from the disk subsystem and increase the speed of sequential reading. Ahead reading of the file system is slightly different from the usual synchronous reading and can confuse the controller of the disk array. It may not recognize the nature of the load and not include hardware read ahead. For example, if a disk array is connected via a SAN (Storage Area Networking) and there are several paths to it. Due to load balancing, asynchronous requests can arrive at different ports of the disk array almost simultaneously. In this case, requests can be processed by the controller in the wrong order as they are sent from the server. As a result, the array does not recognize sequential reads. Solving such problems may be the most time consuming and laborious. Sometimes the solution lies in the configuration area, sometimes it helps to disable one of the read ahead (if possible), sometimes you need to change the code of one of the components.

An example of the effect of read ahead

The customer was not satisfied with the database backup time. As a test, a single 50 GB file backup was performed. Below are the results of three tests with different file system settings.

Directories ... 0
Regular files ... 1
- Objects Total ... 1
Total Size ... 50.51 GB

1. Advance read disabled (Direct IO)

Run Time ... 0:17:10
Backup Speed ... 71.99 MB / s

2. Standard read ahead settings (read_pref_io = 65536, read_nstream = 1)

Run Time ... 0:05:17
Backup Speed ... 163.16 MB / s

3. Increased (strongly) read ahead size (read_pref_io = 262144, read_nstream = 64)

Run Time ... 0:02:27
Backup Speed ... 222.91 MB / s

As you can see from the example, read ahead allowed to significantly reduce backup time. Further operation showed that all other tasks on the system worked normally with such a large size read ahead (test 3). No problems due to excessive read ahead were noticed. As a result, these settings and left.

Source: https://habr.com/ru/post/127703/

All Articles