📜 ⬆️ ⬇️

Fun starts or C ++ and STL: who is faster?

Formulation of the problem


We are interested in the speed of various standard C ++ tools for performing similar operations on a large number of elements (loops, STL algorithms, iterators, pointers, etc.). For simplicity, we will consider the original problem of calculating the sum of a large number of integers. ( Link for those who do not like to read, and loves to watch.)

Prepare


First, we write the code that will evaluate the speed of the cycle:

#include <iostream> #include <sys/time.h> #include <iomanip> using namespace std; const int COUNT = 1000 * 1000 * 1000; int main () { struct timespec tm1, tm2; clock_gettime(CLOCK_REALTIME, &tm1); // Our code to estimate clock_gettime(CLOCK_REALTIME, &tm2); double t1 = 1000.0 * tm1.tv_sec + tm1.tv_nsec / (1000.0 * 1000); double t2 = 1000.0 * tm2.tv_sec + tm2.tv_nsec / (1000.0 * 1000); cout << "t=" << setprecision(5) << t2 -t1 << " ms" << endl; return 0; }; 

Not very accurate, but simple and straightforward.

With sequential enumeration of elements in STL, the fastest container is a vector, since its elements are arranged sequentially. So create a vector of size COUNT and fill it with (almost) random integer values:
')
 #include <vector> #include <algorithm> … int RandomNumber () { static int s = 0; return (s++ % 100); } int main () { vector<int> vec(COUNT); generate(vec.begin(), vec.end(), RandomNumber); ... // Our code to estimate 

Accumulate


The most kosher way to solve our problem is to use the accumulate function:

 #include <numeric> … // Our code to estimate accumulate(vec.begin(), vec.end(), 0, [](int sum, int i) { return sum + i; }); 

Compile and run:

 $ g++ -std=c++11 main.cpp -o test && ./test $ duration 349.29 mseconds 

Not a little ...

for_each


Let's try for_each?

 // Our code to estimate int sum = 0; for_each(vec.begin(), vec.end(), [&sum](int i) { sum += i; }); 

We look:

 $ g++ -std=c++11 main.cpp -o test && ./test $ duration 417.64 mseconds 

Interestingly, if you use an object with a state, it will turn out a little faster: passing by reference to the lambda bit slows us down :

 class Sum { int sum; public: Sum() : sum(0) {} inline void operator()(int i) { sum += i; } inline operator int() { return sum; } }; … // Our code to estimate for_each(vec.begin(), vec.end(), Sum()); 

Drumroll…

 $ g++ -std=c++11 main.cpp -o test && ./test $ duration 371.64 mseconds 

You can also make a lambda with a state and get a result comparable to accumulate , but the game is not worth the trouble: the result still does not suit us.

for


Let's return to the good old cycles:

 // Our code to estimate int sum = 0; for(auto i = vec.begin(); i != vec.end(); i++) { sum += *i; }; 

So what?

 $ g++ -std=c++11 main.cpp -o test && ./test $ duration 520.66 mseconds 

Well, it is clear why, change a little code:

 auto end = vec.end(); for(auto i = vec.begin(); i != end; i++) { 

Compile and run again:

 $ g++ -std=c++11 main.cpp -o test && ./test $ duration 403.68 mseconds 

Already better! We also know that the prefix increment is slightly faster than postfix :

 for(auto i = vec.begin(); i != end; ++i) { 

This is because a temporary object + copy is used in the postfix-increment operation.

Compile and run:

 $ g++ -std=c++11 main.cpp -o test && ./test $ duration 327.05 mseconds 

The best result for today! But it's not over yet ...
Why was the cycle a little faster ? Basically, because there is no function call, and there are no unnecessary copies of an object (in our case, an integer) amount.

iterator or not


We will refuse iterators in general:

 for(int i = 0; i < COUNT; ++i) { sum += vec[i]; }; 

Compile and run:

 $ g++ -std=c++11 main.cpp -o test && ./test $ duration 110.52 mseconds 

This is the result.

Here we won at once on two operations - the increment of the inta is faster than the increment operator of the iterator, and the dereferencing operator is slower than access by index.

Arrays


What happens if we use a regular array? More precisely, we will take elements by the offset of the pointer, and not by the index (replacing a vector with an array is meaningless - the vector stores its data in an array):

 int *array = &vec[0]; for(int i = 0; i < COUNT; ++i) { sum += array[i]; }; 

And it was worth it:

 $ g++ -std=c++11 main.cpp -o test && ./test $ duration 81.219 mseconds 

Access by signpost is much faster!
And if you iterate directly on the index, without indices, it will be released much faster:

 auto end = &vec[vec.size () - 1]; for(int *i = &vec[0]; i != end; ++i) { sum += *i; }; 

 $ g++ -std=c++11 main.cpp -o test && ./test $ duration 24.494 mseconds 


Hardcore


What else can we do with the loop? Accurately - "register":

 register int *array = &vec[0]; register int sum = 0; for(register int i = 0; i != COUNT; ++i) { sum += array[i]; }; 

And what do we get?

 $ g++ -std=c++11 main.cpp -o test && ./test $ duration 10.456 mseconds 

The result is good, but it should be remembered that the abuse of the register modifier negates it : the compiler stops paying attention to it when there are not enough registers, but there are always not enough) So the above result is more of academic interest than practical.

You can experiment with algorithms, for example, this change:

 sum += array[i] + array[++i]; 

Speed ​​up our code in two more:

 $ g++ -std=c++11 main.cpp -o test && ./test $ duration 4.8562 mseconds 

UPDATE : and lead to undefined behavior! Thank you alexac for the comment.
The compiler optimization also went beyond the scope of the article, we do not use them. But this is the topic of a separate article ...

Conclusion


The speed of our cycle has increased by an order ! At the same time, we did not even think about multithreaded algorithms (and algorithms in general). We also did not use anything other than native C ++: we acted strictly according to the standard and did not use any third-party libraries.

image

So, if the speed of our cycle is important to us - the STL algorithms are not suitable . Iterators are flexible, but not too fast - the fastest access is through normal pointers.

Source: https://habr.com/ru/post/260025/


All Articles