To get suitable benchmark figures is half the battle, but the second half is to interpret them correctly, learn something new, and be able to apply. The 100x differences between the default and normal build surprised, decided to dig deeper. According to the results I got to know what is going on in the debug; looked for differences between 2005 and 2008 studio (did not find); figured out how to speed up the debug build 3 times in a couple of minutes (set the block against a backstab); the “take and run” method got results that differ from the author's by 3.5 times (hellish x64 power in action!); and, for laughter, he measured the bad, unsuitable disbelief against the good (the bad turned out to be up to 100 times faster). Details under the cut.
I will start with references to sorsy, otherwise it’s hard to search.
The initial post ,
their test code ,
my test code .
Having eliminated the jitter, I returned to the original question: the brakes, more than 100 times, very slowly, from where they come from. Someone else's test is good, but his familiar, wrote, launched. It was lazy to create a project, to compile from a comstroke much faster.
cl2005 /O2 /EHsc 1.cpp
std it++ res=49995000, 28.5 msec
std ++it res=49995000, 28.6 msec
my res=49995000, 19.0 msec
cl2005 /EHsc 1.cpp
std it++ res=49995000, 534.2 msec
std ++it res=49995000, 437.6 msec
my res=49995000, 69.9 msec
')
Oops. The results, however, differ: those guys are 30 times slower, I have only 10. For those guys, the post-increment slows down 3 times, I have about 20%. What am I doing wrong? Is it really so fierce difference between the compiler and the banned library? Well, we put in 2008, check it out.
cl2008 /O2 /EHsc 1.cpp
std it++ res=49995000, 64.3 msec
std ++it res=49995000, 63.4 msec
my res=49995000, 19.0 msec
cl2008 /EHsc 1.cpp
std it++ res=49995000, 732.1 msec
std ++it res=49995000, 678.8 msec
my res=49995000, 70.0 msec
Md There really are differences, only in the opposite direction: the release build of VS 2005 however was three times faster. The improvements are obvious; Apparently, SCL in studio number 2008 has become three times more secure, tk. with _SECURE_SCL 0, the speed is the same. (Looking ahead, most other tests are also almost the same.) What else am I doing wrong? The compiler is now the same, std :: vector seems to be the same, the output is unequivocal: I compile incorrectly. In sense, keys of the compiler disperse. We look into the solution, there is quite a spreading comstar.
/Od /D "WIN32" /D "_DEBUG" /D "_CONSOLE" /D "_UNICODE" /D "UNICODE" /Gm /EHsc /RTC1 /MDd /Fo"Debug\\" /Fd"Debug\vc90.pdb" /W3 /nologo /c /ZI /TP /errorReport:prompt
Revolutionary instinct and brute force fails to unload unimportant keys, and leave important ones. There are three important keys: a) / MDd, b) / RTC1, c) / ZI. The results look like this.
cl2008 /Od /EHsc /MDd 1.cpp
std it++ res=49995000, 6026.1 msec
std ++it res=49995000, 1953.9 msec
my res=49995000, 66.5 msec
cl2008 /Od /EHsc /MDd /RTC1 1.cpp
std it++ res=49995000, 7572.0 msec
std ++it res=49995000, 2385.3 msec
my res=49995000, 101.8 msec
cl2008 /Od /EHsc /MDd /RTC1 /ZI 1.cpp
std it++ res=49995000, 18722.0 msec
std ++it res=49995000, 7131.8 msec
my res=49995000, 511.9 msec
Here, now everything is just like that of people: and the crawling of the vector slows down like a padla, and the post-increment began to slow down. Nishtyak! You can figure out what's wrong. For a couple of minutes reading cl /? It turns out three simple understandable things (one per key), which are still new (see well forgotten old ones) and therefore are amazing.
/ MDd authenticated the debug library (which does not matter) and automatically included / D _DEBUG (which is important). From this, SCL included an additional package of checks inside itself, in addition to those included in _SECURE_SCL, which successfully slowed down the postincrement 8 times, and the increment 3 times.
/ RTC1 includes stack collapse checks and uninitialized variables. If you step into the dizasm debugger, you can see that there are five calls of different functions in the internal loop (++,! =, * And two destructors). Every challenge needs to be checked: what if he takes and how he breaks the stack! To do this, a little less than 256 bytes of marker good are thrown onto the stack for each (each) call of one or another STL overloaded operator. Bye-bye, another 20% is no longer the top performance itself.
/ ZI, however, beats everyone and takes the bank home. This is Edit and Continue. It is probably convenient to fix and restart the program on the fly (I myself don’t know, I don’t use it). For convenience, you have to pay and the price is 3 more brakes.
Total in debug post-increment slows down the epic 291 times (!!!) compared with the release. What immediately raises the question: why do I have as many as 291 times, and those guys have only 95 ?! I collect the original test, wait for the end is not enough patience, reduce the Count 10 times, multiply the time in my head by 10. It turns out that
it++, x86, release: 1.00
++it, x86, release: 1.00
it++, x86, debug: 275.7
++it, x86, debug: 101.9
it++, x64, release: 0.87779
++it, x64, release: 0.87753
it++, x64, debug: 83.2849
++it, x64, debug: 27.1557
Judging by the numbers, in the release from my x86 they took income tax of 13%, and in the debug they robbed them in general, leaving only cowards, socks and slippers. Fading insulting, but nothing. Anyway, XP will not give up, for me the seven is difficult, too aero transparent, plus a second waiting service pack.
According to the results, it is clear who is to blame for the hellish brakes of the debug build. 2 with a little time eats _SECURE_SCL, 5 times off optimization, from 3 times (++ it) to 8 times (it ++) guzzle checks under _DEBUG, 1.2 times sprinkles / RTC, and ends up with a final chord 3 times / ZI, total 2 * 5 * 8 * 1.2 * 3 = 288, chicken by grain, all code in ... checks, tormozischa in 300 (three hundred) times. Chicken hands, of course, eat. Of the eternal three, now only one question remains: what to do?
In general, you can do a lot of things, but for a long time. However, in 2 minutes you can do 2 useful things. When you make a volitional decision not to use Edit and Continue (if it works on your project at all) and generate the usual Soviet PDB, it starts to work 3 times faster if not all 5 times. Bo2x, / RTC checks disable, of course, uncool. But! Reading MSDN reveals an interesting runtime_checks pragma. Those. you can turn off these point checks for particularly frequent functions in the project, or there for the whole STL. We arrange the external section include two lines, enjoy.
#pragma runtime_checks("",off)
...
#pragma runtime_checks("",restore)
We get back our honest 6 seconds instead of 18 seconds using it ++, 2 seconds instead of 7 seconds with ++ it. The conclusion about the need to use ++ it is convincingly confirmed. Debazh build accelerated about 3 times. Profit! We try to apply what we know to the original test, it turns out similarly. At least on the good old x86.
vanilla
it++, x86, debug: 275.7
++it, x86, debug: 101.9
#pragma runtime_checks, /Zi instead of /ZI
it++, x86, debug: 102.2
++it, x86, debug: 30.0
A bad, crooked, unprofitable and primitive non-vector, who can’t do anything at all, shows his honest 0.066 sec, those in debug. 90 times faster than the it ++ option, or 30 times the ++ it option. About 100 times at the beginning of the post, I
lied rounded. He has only one check: the correctness of the index upon access (which, however, covers approximately 99% of the checks needed from the vector); focus with pre- and post-increments are absent along with iterators. This means nothing. The author does not hint at anything. The benchmark is clearly synthetic. The vector was written in just 3 minutes and the corresponding functionality, those. compared to STL does not exist. The speed of the debug build is not necessarily important; an extra check may be important; speed of development is important to all and always; the truth will always be prompted by the profiler.
Remember about _SECURE_SCL, about _DEBUG, about compiler keys, about #pragma runtime_checks. The effect, mneee, is stunning. (Personally, I was 300 times killed by the results. And the difference from 3 to 5 times because of / ZI instead of / Zi then ate the corpse. "I knew about it, but I did not guess.")
The correct benchmarks for you. And quick debugging.