How to efficiently read data from the disk (provided that you have .Net)

Hi, Habr! Some time ago I was interested in the question: how best to read data from a disk (assuming that you have .Net)? The task of reading heaps of files is found in a variety of programs that, at the very start, begin to read configurations, some of them load modules, etc.

On the Internet, I have not found such comparisons (except for tuning under certain configurations).

The results can be viewed on Github : SSD , HDD .
')

Ways of reading and testing algorithm

There are several main ways:

ScenarioReadAllAsParallel - read with ReadAllText on ThreadPool ;
ScenarioSyncAsParallel - read using Streams synchronously on the Thread Pool ;
ScenarioNewThread - to read with the help of Streaming synchronously and on a separate stream for each reading, the time to start a new stream is also taken into account ;
ScenarioAsync2 - read using Streams asynchronously (ie, async / await, if the file system responds for a long time, then many operations can start in parallel) ;
ScenarioAsync - asynchronous reading, however, the start also occurs in many threads (and not sequentially on the same Main thread, as in the previous test) ;
ScenarioAsyncWithMaxParallelCount — read using Streams asynchronously (ie, async / await), but no more than in N parallel operations .

I tested everything on SSD and HDD (in the first case there was a computer with Xeon 24 cores and 16 GB of memory and Intel SSD , in the second - a Mac Mini MGEM2LL / A with Core i5, 4 GB of RAM and HDD 5400-rpm). Systems are such that, by results, one can understand how to behave better on relatively modern systems and not very new ones.

The project can be viewed here , it represents one main executable file TestsHost and a bunch of projects with the names Scenario *. Each test is:

Running an exe file that counts in pure time.
Once a second, the processor load, memory consumption, disk load and a number of derived parameters (using Performance Counters ) are checked.
The result is remembered, the test is repeated several times. The final result of work is the average time, excluding the largest and smallest values.

Preparation for the test more tricky. So, before running:

We determine the size of files and their number (I chose such that the total volume was larger than the RAM size in order to suppress the influence of the disk cache);
We are looking for files of a given size on a computer (and at the same time we ignore inaccessible files and a number of special folders about which are written below);
Run one of the tests on the fileset, ignore the result. All this is necessary in order to reset the OS cache, remove the influence from previous tests and just warm up the system.

And do not forget about error handling:

The program will return a return code of 0 only if all files have been read.
Sometimes the whole test fails, if suddenly the system starts actively reading the file. We sigh and restart again, adding the file (or folder) to the ignored ones. Since I used the Windows & Program Files directories as a good source of files, most realistically spread over the disk, some files could be blocked for a while.
Sometimes one Performance Counter could give an error, since the process, for example, has already begun to end. In this case, all counters for that second are ignored.
On large files, some tests consistently issued Out Of Memory exceptions. I removed them from the results.

And plus the standard moments about load testing:

Compilation is in Release mode in MSVS. The launch goes as a separate application, without a debugger, etc. There is no tuning, because the essence of the checks is precisely in that - as in ordinary software it is faster to read files.
The anti-virus is disabled, the system update is stopped, the active programs are also stopped. No more tuning was not for the same reason.
Each test is the launch of a separate process. Overhead turned out to be within the framework of the error (i.e. jit, spending on the start of the process, etc.), and therefore I left just such isolation.
Some Performance Counters gave zero results always for HDD / SSD. Since the set of counters is sewn into the program, I left them.
All programs were run as x64, an attempt to make a swap meant a memory inefficiency and immediately went down in statistics due to the long running time.
Thread Priority and other tunings were not used, as there were no attempts to squeeze the maximum (which will depend heavily on a much larger number of factors).
Technologies: .Net 4.6, x64

results

As I wrote in the header, the results are on Github : SSD , HDD .

SSD drive

The minimum file size (bytes): 2, the maximum size (bytes): 25720320, the average size (bytes): 40953.1175

Scenario	Time
ScenarioAsyncWithMaxParallelCount4	00: 00: 00.2260000
ScenarioAsyncWithMaxParallelCount8	00: 00: 00.5080000
ScenarioAsyncWithMaxParallelCount16	00: 00: 00.1120000
ScenarioAsyncWithMaxParallelCount24	00: 00: 00.1540000
ScenarioAsyncWithMaxParallelCount32	00: 00: 00.2510000
ScenarioAsyncWithMaxParallelCount64	00: 00: 00.5240000
ScenarioAsyncWithMaxParallelCount128	00: 00: 00.5970000
ScenarioAsyncWithMaxParallelCount256	00: 00: 00.7610000
ScenarioSyncAsParallel	00: 00: 00.9340000
ScenarioReadAllAsParallel	00: 00: 00.3360000
ScenarioAsync	00: 00: 00.8150000
ScenarioAsync2	00: 00: 00.0710000
ScenarioNewThread	00: 00: 00.6320000

So, when reading many small files, the two winners are asynchronous operations. In fact, in both cases, .Net used 31 threads.

In fact, both programs differed by the presence or absence of ActionBlock for ScenarioAsyncWithMaxParallelCount32 (with restriction), it turned out that reading is better not to limit, then more memory will be used (in my case 1.5 times), and the restriction will be just at the level of standard settings (since Thread Pool depends on the number of cores, etc.)

The minimum file size (bytes): 1001, the maximum size (bytes): 25720320, the average size (bytes): 42907.8608

Scenario	Time
ScenarioAsyncWithMaxParallelCount4	00: 00: 00.4070000
ScenarioAsyncWithMaxParallelCount8	00: 00: 00.2210000
ScenarioAsyncWithMaxParallelCount16	00: 00: 00.1240000
ScenarioAsyncWithMaxParallelCount24	00: 00: 00.2430000
ScenarioAsyncWithMaxParallelCount32	00: 00: 00.3180000
ScenarioAsyncWithMaxParallelCount64	00: 00: 00.5100000
ScenarioAsyncWithMaxParallelCount128	00: 00: 00.7270000
ScenarioAsyncWithMaxParallelCount256	00: 00: 00.8190000
ScenarioSyncAsParallel	00: 00: 00.7590000
ScenarioReadAllAsParallel	00: 00: 00.3120000
ScenarioAsync	00: 00: 00.5080000
ScenarioAsync2	00: 00: 00.0670000
ScenarioNewThread	00: 00: 00.6090000

Increasing the minimum file size, I got:

The leaders were the launch of the program with the number of threads close to the number of processor cores.
In a series of tests, one of the threads was constantly waiting for a blocking release (see Performance Counter "Concurrent Queue Length" ).
Synchronous way of reading from the disk is still an outsider.

The minimum file size (bytes): 10007, the maximum size (bytes): 62 444 171, the average size (bytes): 205102.2773

Scenario	Time
ScenarioAsyncWithMaxParallelCount4	00: 00: 00.6830000
ScenarioAsyncWithMaxParallelCount8	00: 00: 00.5440000
ScenarioAsyncWithMaxParallelCount16	00: 00: 00.6620000
ScenarioAsyncWithMaxParallelCount24	00: 00: 00.8690000
ScenarioAsyncWithMaxParallelCount32	00: 00: 00.5630000
ScenarioAsyncWithMaxParallelCount64	00: 00: 00.2050000
ScenarioAsyncWithMaxParallelCount128	00: 00: 00.1600000
ScenarioAsyncWithMaxParallelCount256	00: 00: 00.4890000
ScenarioSyncAsParallel	00: 00: 00.7090000
ScenarioReadAllAsParallel	00: 00: 00.9320000
ScenarioAsync	00: 00: 00.7160000
ScenarioAsync2	00: 00: 00.6530000
ScenarioNewThread	00: 00: 00.4290000

And the last test for SSD: files from 10 KB, their number is smaller, but they themselves are larger. And as a result:

If you do not limit the number of threads, the reading time becomes closer to synchronous operations.
Limiting is already desirable as (number of cores) * [2.5 - 5.5]

HDD drive

If everything was more or less good with the SSD, here I had frequent falls, so I excluded some of the results with the fallen programs.

The minimum file size (bytes): 1001, the maximum size (bytes): 54989002, the average size (bytes): 210818,0652

Scenario	Time
ScenarioAsyncWithMaxParallelCount4	00: 00: 00.3410000
ScenarioAsyncWithMaxParallelCount8	00: 00: 00.3050000
ScenarioAsyncWithMaxParallelCount16	00: 00: 00.2470000
ScenarioAsyncWithMaxParallelCount24	00: 00: 00.1290000
ScenarioAsyncWithMaxParallelCount32	00: 00: 00.1810000
ScenarioAsyncWithMaxParallelCount64	00: 00: 00.1940000
ScenarioAsyncWithMaxParallelCount128	00: 00: 00.4010000
ScenarioAsyncWithMaxParallelCount256	00: 00: 00.5170000
ScenarioSyncAsParallel	00: 00: 00.3120000
ScenarioReadAllAsParallel	00: 00: 00.5190000
ScenarioAsync	00: 00: 00.4370000
ScenarioAsync2	00: 00: 00.5990000
ScenarioNewThread	00: 00: 00.5300000

For small files, the leaders are again asynchronous reading. However, synchronous work also showed a good result. The answer lies in the load on the disk, namely, in limiting parallel reads. When you try to forcefully start reading in many threads, the system rests on a large queue for reading. As a result, instead of parallel work, time is spent trying to service many requests in parallel.

The minimum file size (bytes): 1001, the maximum size (bytes): 54989002, the average size (bytes): 208913,2665

Scenario	Time
ScenarioAsyncWithMaxParallelCount4	00: 00: 00.6880000
ScenarioAsyncWithMaxParallelCount8	00: 00: 00.2160000
ScenarioAsyncWithMaxParallelCount16	00: 00: 00.5870000
ScenarioAsyncWithMaxParallelCount32	00: 00: 00.5700000
ScenarioAsyncWithMaxParallelCount64	00: 00: 00.5070000
ScenarioAsyncWithMaxParallelCount128	00: 00: 00.4060000
ScenarioAsyncWithMaxParallelCount256	00: 00: 00.4800000
ScenarioSyncAsParallel	00: 00: 00.4680000
ScenarioReadAllAsParallel	00: 00: 00.4680000
ScenarioAsync	00: 00: 00.3780000
ScenarioAsync2	00: 00: 00.5390000
ScenarioNewThread	00: 00: 00.6730000

For the average file size, asynchronous reading continued to show the best result, except that it is desirable to limit the number of threads to an even lower value.

The minimum file size (bytes): 10008, the maximum size (bytes): 138634176, the average size (bytes): 429888,6019

Scenario	Time
ScenarioAsyncWithMaxParallelCount4	00: 00: 00.5230000
ScenarioAsyncWithMaxParallelCount8	00: 00: 00.4110000
ScenarioAsyncWithMaxParallelCount16	00: 00: 00.4790000
ScenarioAsyncWithMaxParallelCount24	00: 00: 00.3870000
ScenarioAsyncWithMaxParallelCount32	00: 00: 00.4530000
ScenarioAsyncWithMaxParallelCount64	00: 00: 00.5060000
ScenarioAsyncWithMaxParallelCount128	00: 00: 00.5810000
ScenarioAsyncWithMaxParallelCount256	00: 00: 00.5540000
ScenarioReadAllAsParallel	00: 00: 00.5850000
ScenarioAsync	00: 00: 00.5530000
ScenarioAsync2	00: 00: 00.4440000

Again as leaders, asynchronous reading with a limit on the number of parallel operations. Moreover, the recommended number of threads has become even smaller. And parallel synchronous reading steadily began to show Out Of Memory.

With a larger file size increase, scripts with no limit on the number of parallel reads were more likely to fall out of memory. Since the result was not stable from launch to launch, I have already found such testing inappropriate.

Total

What is the result you can learn from these tests?

In almost all cases, asynchronous reading, compared to synchronous, gave the best result in speed.
As the file size grows, it is advisable to limit the number of threads, since otherwise the reading will be slow, plus the risk of OOM will increase.
In all cases, there was no radically large increase in productivity, a maximum of 2-3 times. Therefore, it is possible that it is not worth rewriting the old legacy application for asynchronous reading.
However, for async programs, access to files will at least reduce the likelihood of crashes and increase speed.

Source: https://habr.com/ru/post/331668/

All Articles