Exceptions and performance

I decided to put a little research on how the support of C ++ exceptions affects the overall performance of the code.

My work experience includes several years of development for various embedded systems, where performance has to be constantly taken into account when writing code (real-time systems that process a large amount of information — processor and memory speeds have never been “a lot” there). Accordingly, in this environment, programmers are usually quite well aware of what overhead is incurred (or not borne) by one or another opportunity provided by the C ++ language. For example, namespace support - no extra cost at all; RTTI - additional section with the names of the classes / structures with their type_info (the final binary increases in size, but this does not affect code generation); etc. As ( not on all platforms ), support for exceptions is implemented, we'll see.

Used tools: ancient front-end from EDG for converting C ++ code to C code and Artistic Style for formatting received C-files (otherwise it is impossible to read them). About the front-end from EDG, I must say especially - this is the very frond-end that is built into the Intel compilers, Texas Instruments compilers, etc. Since supporting all the features of C ++ is a very difficult task (compared to implementing all the features of the C language), on some platforms C ++ code is translated into identical C code, and this code is already fed to the C compiler. The front-end is not used the freshest, but it is suitable for understanding.
')
So, let's take a fairly simple text code (the names and constants are specially chosen so that they are easy to find in the processed listing):

struct AAAAA { int a; virtual void process(); AAAAA() { a = 1234; } virtual ~AAAAA() {} }; struct BBBBB : AAAAA { virtual void process(); BBBBB() { a = 5678; } virtual ~BBBBB() {} }; // forward declaration int bar(); int foo() { BBBBB b1; b1.a = bar(); b1.process(); BBBBB b2; b2.a = bar(); b2.process(); return b1.a + b2.a; }

Everything is very simple - two inline constructors / destructors, a pair of virtual functions, a pair of calls to an external function.

Here is the resulting code without exception support (the result was processed by AStyle). Sheet, but it is necessary:

 #line 1 "1.cpp" struct __T9639768; struct AAAAA; #line 9 struct BBBBB; struct __T9639768 { short d; short i; void (*f)(); }; #line 1 struct AAAAA { int a; struct __T9639768 *__vptr; }; #line 9 struct BBBBB { struct AAAAA __b_AAAAA; }; #line 17 extern int bar__Fv(void); extern int foo__Fv(void); #line 10 extern void process__5BBBBBFv(struct BBBBB *const); extern struct __T9639768 __vtbl__5AAAAA[3]; extern struct __T9639768 __vtbl__5BBBBB[3]; #line 19 int foo__Fv(void) { auto int __T9722792; auto struct BBBBB b1; auto struct BBBBB b2; #line 21 { { ((b1.__b_AAAAA).__vptr) = __vtbl__5AAAAA; ((b1.__b_AAAAA).a) = 1234; } ((b1.__b_AAAAA).__vptr) = __vtbl__5BBBBB; ((b1.__b_AAAAA).a) = 5678; } ((b1.__b_AAAAA).a) = (bar__Fv()); process__5BBBBBFv((&b1)); { { ((b2.__b_AAAAA).__vptr) = __vtbl__5AAAAA; ((b2.__b_AAAAA).a) = 1234; } ((b2.__b_AAAAA).__vptr) = __vtbl__5BBBBB; ((b2.__b_AAAAA).a) = 5678; } ((b2.__b_AAAAA).a) = (bar__Fv()); process__5BBBBBFv((&b2)); { __T9722792 = ((((b1.__b_AAAAA).a)) + (((b2.__b_AAAAA).a))); { ((b2.__b_AAAAA).__vptr) = __vtbl__5BBBBB; { { ((b2.__b_AAAAA).__vptr) = __vtbl__5AAAAA; } } } { ((b1.__b_AAAAA).__vptr) = __vtbl__5BBBBB; { { ((b1.__b_AAAAA).__vptr) = __vtbl__5AAAAA; } } } return __T9722792; } }

It is clearly seen how each object is “constructed”, how vtable / inheritance is implemented, that the constructors / destructors are still inline, and the C-compiler still has all the information to effectively optimize this code. Also note that the resulting C code takes 72 lines, and approximately 1.6kB.

Now the same source is translated with exception support.

See the result here: C-equivalent of 253 lines and 8.5 kB . Here I will not post it, I will confine myself to the main function (the former int foo() ) with comments of some moments:

 int foo__Fv(void) { static struct __T9641460 __T9653776[2] = {{((void (*)())__dt__5BBBBBFv),((unsigned short)0U),((unsigned short)65535U),((unsigned char)0U)},{((void (*)())__dt__5BBBBBFv),((unsigned short)1U),((unsigned short)0U),((unsigned char)0U)}}; auto void *__T9731464[2]; auto int __T9733536; auto struct #line 20 __T9643156 __T9734356; auto struct BBBBB b1; auto struct BBBBB b2; (__T9734356.next) = __curr_eh_stack_entry; __curr_eh_stack_entry = (&__T9734356); (__T9734356.kind) = ((unsigned char)1U); (((__T9734356.variant).function).regions) = ((struct __T9641460 *)__T9653776); (((__T9734356.variant).function).obj_table) = ((void **)__T9731464); ((( #line 25 __T9734356.variant).function).saved_region_number) = __eh_curr_region; __eh_curr_region = ((unsigned short)65535U); #line 21 __ct__5BBBBBFv((&b1)); (((void **)__T9731464)[0U]) = ((void *)(&b1)); __eh_curr_region = ((unsigned short)0U); ((b1.__b_AAAAA).a) = (bar__Fv()); process__5BBBBBFv((&b1)); __ct__5BBBBBFv((&b2)); (((void **)__T9731464)[1U]) = ((void *)(&b2)); __eh_curr_region = ((unsigned short)1U); ((b2.__b_AAAAA).a) = (bar__Fv()); process__5BBBBBFv((&b2)); { __T9733536 = ((((b1.__b_AAAAA).a)) + (((b2.__b_AAAAA).a))); __eh_curr_region = ((unsigned short)0U); __dt__5BBBBBFv((&b2), 2); __eh_curr_region = ((unsigned short)65535U); __dt__5BBBBBFv((&b1), 2); { __eh_curr_region = ((((__T9734356.variant).function).saved_region_number)); __curr_eh_stack_entry = #line 29 ((__T9734356.next)); return __T9733536; } } }

Major changes:

constructors stopped being inline (calls of __ct__5BBBBBFv appeared),
similar situation with destructors (__dt__5BBBBBFv),
the code now keeps track of which object is already constructed (or already deleted), and which is not yet - because you need to know which object destructors need to be called if an exception occurs,
The code of constructors / destructors has become more complicated (functions __dt__5BBBBBFv / ct__5BBBBBFv, see the link),

The very trouble with the first two points - the logic of designers / destructors has become so complicated that the front-end brings them into separate functions. It is clear why - embedding them (inline) will lead to a strong increase in the code of each function, where objects like BBBB are used. But the consequence of this will be that the C-compiler optimizer generates a significantly less efficient code (we have additional calls and checks in the code).

That is: simply the inclusion of support for exceptions led to an increase in the size of the final binary file, and to slow down the operation of all functions inside which objects are being constructed (with the exception of the most trivial).

Actually, this is the main reason why support for exceptions for embedded development is turned off by default - you have to pay for it, even if you really don’t use it.

PS: this all, of course, does not mean that “exceptions are bad!” Or “use return codes instead of exceptions!”. Just every tool is good for its task.
PPS: support for handling erroneous situations in the embedded development, of course, is and is actively used. It usually does not use C ++ exceptions, this is a topic for a separate article.

Source: https://habr.com/ru/post/104172/

All Articles

Exceptions and performance

More articles: