📜 ⬆️ ⬇️

Learning benchmarking correctly (including iterators)

I downloaded an example from the previous posting, from start-up to start-up time was shaking up to 1.5 times, from 0.76 to 1.09 seconds. How to evaluate the results of such benchmarks is unclear. Problem familiar, faced and solved just yesterday. In short, CPU throttling is to blame, as well as a strange affinity in the code. Under the cut the struggle (successful) and discussion.

So, the previous example about iterators, VS 2005, 4 runs. The machine is completely unloaded, the torrents and so on are not running in the background, even Winamp is turned off. (Although, by the way, for CPU bound workloads, the presence of Winamp and even torrents does not create much extra jitter, a maximum of 1-2%.) The results run from 0.758 to 1.085, which is almost 1.5 times. On different runs, again, he wins different things, which is unacceptable for the race. ;)

x = 3256681784 iterator++. Total time : 0.795557
x = 3256681784 ++iterator. Total time : 0.892076

x = 3256681784 iterator++. Total time : 1.08741
x = 3256681784 ++iterator. Total time : 1.0848

x = 3256681784 iterator++. Total time : 0.898355
x = 3256681784 ++iterator. Total time : 0.758123

x = 3256681784 iterator++. Total time : 0.906159
x = 3256681784 ++iterator. Total time : 0.861794


What's the matter? We look into the code, there is a QPC, and before it is SetThreadAffinityMask. Treason, in the headquarters are not ours. C2D processor, it can hold different frequencies on different cores. And when it is idle, the OS uses it and reduces the frequency. If you calibrate the timer (QueryPerformanceFrequency) on one core, and read the counter data (QueryPerformanceCounter) on another, or on the same, but after increasing the frequency, there will be garbage.

We change exactly one character, in SetThreadAffinityMask, the second parameter is 1, not 0.
')
x = 3256681784 iterator++. Total time : 0.751778
x = 3256681784 ++iterator. Total time : 0.685859

x = 3256681784 iterator++. Total time : 0.737615
x = 3256681784 ++iterator. Total time : 0.686026

x = 3256681784 iterator++. Total time : 0.736503
x = 3256681784 ++iterator. Total time : 0.688713

x = 3256681784 iterator++. Total time : 0.772983
x = 3256681784 ++iterator. Total time : 0.68895


Much better. The first test shakes from 0.736 to 0.772, those. by 5%, not 50%. The second is from 0.686 to 0.689, those. by 0.4%. Where does this difference come from?

Hypothesis. By the time the second test starts, the cache heats up. We read the code carefully. However, there are 5000 runs, the cold cache will change something only for the first one. The hypothesis is incorrect, set aside.

Hypothesis. The first test is the first. Perhaps increasing the frequency of the core occurs in its process. Well, let's wipe the processor in front of all the tests. Since the compiler is very smart and strives to pre-calculate and optimize constants, especially those that are not used, we recall the volatile magic modifier. Rebild, oops, did not help. We remember about affinity. We nail to the whole core in general the whole process, and not just a piece that is timed (otherwise there is a danger of heating the wrong core). Rebild, oops, win! Total we put in the very beginning of the program such 4 lines.

  SetProcessAffinityMask(GetCurrentProcess(), 1); volatile int zomg = 1; for ( int i=1; i<1000000000; i++ ) zomg *= i; 


And enjoy the result.

x = 3256681784 iterator++. Total time : 0.687585
x = 3256681784 ++iterator. Total time : 0.685685

x = 3256681784 iterator++. Total time : 0.687524
x = 3256681784 ++iterator. Total time : 0.68579

x = 3256681784 iterator++. Total time : 0.686004
x = 3256681784 ++iterator. Total time : 0.688326

x = 3256681784 iterator++. Total time : 0.688472
x = 3256681784 ++iterator. Total time : 0.685775


Bingo. Shivering less than 1%, which is quite normal. And now there are no differences, finally. How, in fact, should be according to the theory. (In the release, both iterators should turn around in a pointer walk; in the debug with SECURE_SCL turned on, of course, in utter hell with elements of Israel.)

On Linux, there is exactly the same problem, called the performance governor. Switching the governor to something like performance, as well as games with affinity and so on, help too.

The correct benchmarks for you.

Source: https://habr.com/ru/post/113682/


All Articles