How slow are iostreams?

I / O streams in the standard C ++ library are easy to use, type safe, resilient to resource leaks, and allow simple error handling. However, behind them the reputation of "slow". There are several reasons for this, such as the extensive use of dynamic allocation and virtual functions. In general, flows are one of the most ancient parts of the standard library (they began to be used around 1988), and many decisions in them are now perceived as "controversial." However, they are widely used, especially when you need to write some simple program that works with text data.

Performance issue iostreams is not idle. In particular, the problem of console I / O performance can be encountered in sports programming systems, where even using a good algorithm, one can not go through time only because of input / output. I also encountered this problem when processing scientific data in text format.

Today in the comments at the post there was a discussion about the slowness of iostreams. In particular, freopen writes

It's funny to look at your optimization, located next door with reading through cin :)

aesamson gives this recommendation

You can replace it with getchar_unlocked () for * nix or getchar () for all others.
getchar_unlocked> getchar> scanf> cin, where ">" means faster.

')
In this post I will dispel and confirm some myths and give a couple of recommendations.

All measurements in this post are for Ubuntu 14.10 system with GCC 4.9.1 compiler, compiled with keys

g++ -Wall -Wextra -std=c++11 -O3

The launch was carried out on a laptop with an Intel Core2 Duo P8600 processor (2.4 GHz).

Formulation of the problem

In sports programming, as in the UNIX-way, input data is usually fed to the input stream. So, the task:

There are many non-negative integers on the input stream (stdin), one per line. The program should output the maximum of the input numbers.

Form input data

 seq 10000000 > data

In the data file, we recorded 10 million consecutive integers, with a total volume of 76 megabytes.
We will run the program so

 time ./a.out < data

So, we proceed.

1. `scanf`

We solve the problem using the good old scanf.

 int max_scanf() { int x; int max = -1; while (scanf("%d", &x) == 1) { max = std::max(x, max); } return max; }

When using scanf, it is important not to forget to always check the return value - this is the number of actually read and filled arguments (GCC with -Wall reminds about this). In our case, with successful reading, the return value should be 1.

Main function

 int main() { std::cout << max_scanf() << std::endl; return 0; }

Work time: 1.41 s

2. Naive `std::cin`

Now we will solve the problem in the simplest way with the help of iostreams:

 int max_iostream(std::istream & f) { int x; int max = -1; while(f >> x) max = std::max(x, max); return max; }

Work time: 4.41 s
Wow! Streams were slower than scanf 3 times! That is, it turns out that iostream are really worthless in terms of speed?

3. Fast `std::cin`

In fact, to correct the situation, it is enough to add one single line to the program. At the very beginning of the main function, we insert:

 std::ios::sync_with_stdio(false);

What does it mean?
In order to mix iostreams and stdio in the program, this synchronization was introduced. By default, when working with standard streams ( std::cin, std::cout, std::cerr ...), the buffer is flushed after each I / O operation, so that the data is not mixed. If we assume to use only iostream, then we can disable this synchronization. More details can be read at cppreference .
Work time: 1.33 s
It is quite another matter! Moreover, it is faster than scanf! That is, not everything is so bad. The advantages of iostreams include ease of use, type safety and easier error handling.

All subsequent variants using std :: cin will use this optimization.

4. Naive `std::istringstream`

In addition to input from a file, the standard library also provides classes for input from a string with the same interface. Let's see how slow it is. We will read from the input stream one line at a time, and then parse it with std::istringstream :

 int max_iostream_iss(std::istream & f) { int x; int max = -1; std::string line; while (std::getline(f, line)) { std::istringstream iss(line); if(! (iss >> x)) break; max = std::max(x, max); } return max; }

Work time: 7.21 s
So slow!

5. Reusing `std::istringstream`

It may seem surprising, but the slowest thing in istringstream is creating it. And we create anew for each input line. Let's try to reuse the same object:

 int max_iostream_iss_2(std::istream & f) { int x; int max = -1; std::string line; std::istringstream iss(line); while (std::getline(f, line)) { iss.clear(); //    iss.str(line); //    if(! (iss >> x)) break; max = std::max(x, max); } return max; }

Note that you need 2 calls, clear, to reset the status flags, and str, to set a new buffer from which to read.

Work time: 2.16 s
This is another matter. This is expected to be slower than reading directly from std::cin (data passes 2 times through the thread classes), but not catastrophic.

6. We want even faster! (getchar / getchar_unlocked)

What if the performance is still not enough? Use lower-level I / O and a custom parser. In the comments on the above topic, aesamson gave a sample code that implements the simplest parser of integers (probably taken from StackOverflow). getchar_unlocked is the read-safe version of getchar for reading from the input stream. I added the omission of extra characters and the simplest handling of the end of the file:

 bool read_int_unlocked(int & out) { int c = getchar_unlocked(); int x = 0; int neg = 0; for (; !('0'<=c && c<='9') && c != '-'; c = getchar_unlocked()) { if (c == EOF) return false; } if (c == '-') { neg = 1; c = getchar_unlocked(); } if (c == EOF) return false; for (; '0' <= c && c <= '9' ; c = getchar_unlocked()) { x = x*10 + c - '0'; } out = neg ? -x : x; return true; } int max_getchar_unlocked() { int x; int max = -1; while(read_int_unlocked(x)) max = std::max(x, max); return max; }

Work time: getchar 0.82 s, getchar_unlocked 0.28 s!
Very good result! And you can see how great the slowdown is due to locks for the sake of thread safety.
But this approach has disadvantages - it is necessary to write parsers for all data types used (and this is not so easy even for floating-point numbers), the complexity of error handling, and thread safety in the case of getchar_unlocked . Alternatively, you can try using a parser generator, for example, re2c , boost::Spirit::Qi , etc. (a lot of them).

7. C ++ 11: `std::stoi`

Thanks to Lol4t0 for recalling the std::stoi/std::stol/std::stoll functions in the C ++ 11 std::stoi/std::stol/std::stoll . We will read one line with getline, and then parse it with stol. The code will look like this:

 int max_stoi(std::istream & f) { int max = -1; std::string line; while (std::getline(f, line)) { int x = std::stoi(line); max = std::max(x, max); } return max; }

Work time: 1.04 seconds
This is the fastest standard way to read integers. (And for floating point numbers there are similar functions stof / stod).

8. Bonus: Reading in big blocks + Boost :: Spirit

Let's try to write the fastest option. We will read the input data in large blocks and then parse with Boost :: Spirit :: Qi , which is declared as a generator of very fast parsers. This is a compile-time generator: we describe a grammar in C ++ in notation close to BNF, and a parser is generated during compilation using metaprogramming magic.

Code (attention, boost and metaprogramming detected!)

 #include <boost/spirit/include/qi.hpp> #include <boost/spirit/include/phoenix_core.hpp> #include <boost/spirit/include/phoenix_operator.hpp> #include <boost/spirit/include/phoenix_statement.hpp> template <typename Iterator> Iterator max_parser(Iterator first, Iterator last, int& max) { namespace qi = boost::spirit::qi; namespace ascii = boost::spirit::ascii; namespace phoenix = boost::phoenix; using qi::int_; using qi::_1; using ascii::space; using phoenix::ref; using phoenix::if_; using qi::eoi; using qi::lexeme; bool r = qi::phrase_parse(first, last, // Begin grammar ( *lexeme[int_ >> (!eoi)][if_(_1 > ref(max))[ref(max) = _1]] ) , // End grammar space); return first; } int max_spirit(std::istream & f) { size_t chunk_size = 1 << 20; std::unique_ptr<char[]> buffer(new char[2*chunk_size]); char * middle = buffer.get() + chunk_size; char * begin = middle; int max = -1; while(true) { f.read(middle, chunk_size); if (f.gcount() == 0) break; char * end = middle + f.gcount(); char * last = max_parser(begin, end, max); if (last < middle) break; // copy the remaining data just before the middle: begin = middle - (end - last); std::copy(last, end, begin); } return max; }

Work time: 0.18 s
This is a record!

Results and Tips

Working hours:

No	Method	GCC 4.9.1	clang 3.5.0 + libc ++	GCC 100M *
one	scanf	1.41	1.48
2	std :: cin	4.41	13.30
3	std :: cin and std :: ios :: sync_with_stdio (false)	1.33	13.24
four	std :: istringstream	7.21	9.16
five	std :: istringstream with reuse	2.16	7.92
6a	getchar	0.82	0.84	9.14
6b	getchar_unlocked	0.28	0.26	2.94
7	std :: getline + std :: stoi	1.04	3.53	10.8
eight	Big block + Boost :: Spirit	0.18	1.67	1.90

* - Measurements on a file with 100 million numbers (file size is 848 megabytes).

Recommendations:

In C ++ 11, the fastest standard way to read numbers from a stream is std::getline + std::stol (in combination with sync_with_stdio(false) if the standard stream is std::cin ). This method is much faster than scanf and second only to getchar methods.
To speed up std::cin/std::cout , you can use std::ios::sync_with_stdio(false); In this case, the speed will be comparable or better than scanf. (Just make sure you don't mix stream and stdio operations on the same stream.)
Istringstream has a very slow constructor. Therefore, the performance can be seriously raised if you reuse the stream object.
Greater performance can be achieved using getchar_unlocked (or just getchar if thread safety is needed) and a custom parser.
Even better performance can be achieved if you read the data in large chunks and then work exclusively in memory.

Attention! The results are valid only on a specific system and can vary greatly on other systems! In particular, I quickly tried clang + libc ++ and got much worse thread performance (whereas with libstdc ++, both clang and gcc gave almost identical results). Be sure to test performance when applying tips! (And ideally, report the results on your system in the comments so that others can use it). The full code is available here .

Update 1. On the advice of Lol4t0 method number 7 is added.
Update 2. The runtime on clang + libc ++ has been added to the table (version 3.5.0, performed on the same system). It can be seen that the performance of the threads is very bad, and besides the trick with turning off synchronization does not work. As a result, stoi is 2 times slower than scanf.
Update 3. Added option number 8: reading in large blocks and parsing with Boost :: Spirit. And this is a champion!

Source: https://habr.com/ru/post/246257/

All Articles

How slow are iostreams?

Formulation of the problem

1. scanf

2. Naive std::cin

3. Fast std::cin

4. Naive std::istringstream

5. Reusing std::istringstream