Accelerate C / C ++ file I / O without straining too much

Foreword

There is such a simple and very useful utility in the world - BDelta , and it turned out that it took a long time to take root in our production process (although it was not possible to install its version, but it was definitely not the last available). We use it for its intended purpose - the construction of binary patches. If you look at what's in the repository, it becomes a little sad: in fact, it was abandoned a long time ago and much was outdated there (some time ago my former colleague made some edits there, but it was a long time ago). In general, I decided to resurrect this case: I forked, threw out what I do not plan to use, I overtook the project on cmake , zainlayn "hot" microfunction, removed from the stack large arrays (and arrays of variable length, from which I openly "bombed") , once again drove the profiler - and found out that about 40% of the time is spent on fwrite ...

So what's up with fwrite?

In this code, fwrite (in my particular test case: building a patch between 300 MB files that are close, the input data is completely in memory) is called a million times with a small buffer. Obviously, this thing will slow down, and therefore I would like to somehow influence this outrage. There is no desire to implement any kind of data sources, asynchronous I / O, I wanted to find a simpler solution. The first thing that came to mind was to increase the buffer size.

setvbuf(file, nullptr, _IOFBF, 64* 1024)

but I didn’t get a significant improvement in the result (now fwrite accounted for about 37% of the time) - it’s still not a matter of frequently writing data to disk. Looking “under the hood” of fwrite, you can see what happens inside the lock / unlock FILE structure like this (pseudo-code, all the analysis was done under Visual Studio 2017):

 size_t fwrite (const void *buffer, size_t size, size_t count, FILE *stream) { size_t retval = 0; _lock_str(stream); /* lock stream */ __try { retval = _fwrite_nolock(buffer, size, count, stream); } __finally { _unlock_str(stream); /* unlock stream */ } return retval; }

If you believe the profiler, _fwrite_nolock accounts for only 6% of the time, the rest is the overhead. In my particular case, thread safety is an obvious overkill, and I will sacrifice it, replacing the fwrite call with _fwrite_nolock - even with arguments it is not necessary to subtilize . Total: this simple manipulation at times reduced the cost of recording the result, which in the original version was almost half the time cost. By the way, in the POSIX world there is a similar function - fwrite_unlocked . Generally speaking, the same applies to fread. Thus, using the #define pair you can get quite a cross-platform solution without unnecessary locks if they are not necessary (and this happens quite often).
')

fwrite, _fwrite_nolock, setvbuf

Let's abstract from the original project and we will be engaged in testing of a specific case: records of a large file (512 MB) in extremely small portions - in 1 byte. Test system: AMD Ryzen 7 1700, 16 GB of RAM, HDD 3.5 "7200 rpm 64 MB cache, Windows 10 1809, 32-bit binary built, optimizations enabled, library linked statically.

Sample for the experiment:

 #include <chrono> #include <cstdio> #include <inttypes.h> #include <memory> #ifdef _MSC_VER #define fwrite_unlocked _fwrite_nolock #endif using namespace std::chrono; int main() { std::unique_ptr<FILE, int(*)(FILE*)> file(fopen("test.bin", "wb"), fclose); if (!file) return 1; constexpr size_t TEST_BUFFER_SIZE = 256 * 1024; if (setvbuf(file.get(), nullptr, _IOFBF, TEST_BUFFER_SIZE) != 0) return 2; auto start = steady_clock::now(); const uint8_t b = 77; constexpr size_t TEST_FILE_SIZE = 512 * 1024 * 1024; for (size_t i = 0; i < TEST_FILE_SIZE; ++i) fwrite_unlocked(&b, sizeof(b), 1, file.get()); auto end = steady_clock::now(); auto interval = duration_cast<microseconds>(end - start); printf("Time: %lld\n", interval.count()); return 0; }

The variables will be TEST_BUFFER_SIZE, and for a couple of cases we will replace fwrite_unlocked with fwrite. Let's start with the fwrite case without explicitly setting the buffer size (comment out the setvbuf and the associated code): time 27048906 μs, write speed - 18.93 Mb / s. Now let's set the buffer size to 64 Kb: time - 25037111 ms, speed - 20.44 Mb / s. Now let's test the work of _fwrite_nolock without calling setvbuf: 7262221 µs, speed - 70.5 MB / s!

Let's experiment further with the buffer size (setvbuf):

The data were obtained by averaging 5 experiments; I was too lazy to consider the errors. As for me, 93 MB / s when writing 1 byte to a normal HDD is a very good result, all you have to do is choose the optimal buffer size (in my case 256 KB - just right) and replace fwrite with _fwrite_nolock / fwrite_unlocked (in If you do not need thread safety, of course).
Similarly, with fread in similar conditions. Now let's see how things are on Linux, the test configuration is as follows: AMD Ryzen 7 1700X, 16 GB of RAM, HDD 3.5 "7200 rpm 64 MB cache, OS OpenSUSE 15, GCC 8.3.1, we will test the x86-64 binary, the file system on ext4 test section.Fwrite result without explicitly setting the buffer size in this test to 67.6 MB / s, when setting the buffer to 256 KB, the speed increased to 69.7 MB / s. Now we will make similar measurements for fwrite_unlocked - the results are 93.5 and 94.6 MB / s, respectively. Varying the buffer size from 1 Kb to 8 Mb led me to the following conclusions: increasing the buffer increases the write speed, but the difference in my case was only 3 Mb / s, I didn’t notice the difference in speed between the 64 Kb and 8 Mb buffer at all. From the data obtained on this Linux machine, the following conclusions can be made:

fwrite_unlocked is faster than fwrite, but the difference in write speed is not as great as on Windows
The size of the buffer on Linux does not have such a significant impact on the write speed through fwrite / fwrite_unlocked, as on Windows

Overall, the proposed method is effective both on Windows, but also on Linux (albeit at a significantly lesser extent).

Afterword

The purpose of writing this article was to describe a simple and effective technique in many cases (I had never come across the _fwrite_nolock / fwrite_unlocked functions before, they were not very popular, but in vain). I do not pretend to the novelty of the material, but I hope that the article will be useful to the community.

Source: https://habr.com/ru/post/444036/

All Articles

Accelerate C / C ++ file I / O without straining too much

Foreword

So what's up with fwrite?

fwrite, _fwrite_nolock, setvbuf

Afterword

More articles: