C ++ exception handling under the hood or how exceptions work in C ++

From translator

The world has won high-level languages and in the worlds of Ruby-Python-js developers can only rant that in the pluses you should not use this or that. For example, exceptions, because they are slow and generate a lot of extra code. It was worth asking “and what kind of code it generates,” as a response received a mumbling and a lowing. And the truth is - how do they work? Well, let's compile in g ++ with the -S flag, see what happens. It is not difficult to understand the surface, but the fact that there were any misunderstandings prevented me from sleeping. Fortunately, the finished article was found.

On Habré there are several articles, detailed and not very (while still good), on how exceptions work in C ++. However, there is not one truly deep, so I decided to fill this gap, since there is a good material. Who cares how exceptions work in C ++ using the example of gcc - reserve yourself with a pocket or evernote, free time and welcome under the cat.

Part 2
3 part

PS A few words about the translation:
')

the translation is very very close to the text, but sometimes I allowed myself to change whole paragraphs
I never figured out some terms how to translate, for example, landing pad and call site
The works turned out to be much more than it seemed, by the end I even became confused where the translation was, and where the original was, some lines were written at 4 am, in general - if there are some incoherent words or whole sentences somewhere - I'm sorry, I will try soon all correct.
In this case, the code is an integral part of the article, so I will not hide anything under the spoiler.
As always, spelling, punctuation and minor errors - in a personal. Factual errors, inaccuracies and omissions are in the comments.

C ++ exceptions under the hood

Everyone knows that exception handling is difficult. There are plenty of reasons for this in every layer of the "life cycle" of exceptions: it is difficult to write code with a strong guarantee of exceptions (exception safe code), exceptions can be thrown from unexpected places, it can be problematic to understand poorly designed exceptions hierarchy, it slowly works for a lot of voodoo magic under the hood, this is dangerous, since improperly throwing an error can lead to an unforgivable call to std::terminate . And, despite all this, the battle over whether or not to use exceptions in programs still continues. This is probably due to a shallow understanding of how they work.

First you need to ask yourself: how does this all work? This is the first article in a long series that I write about how exceptions are implemented under the hood in C ++ (under the gcc platform under x86, but should be applicable to other platforms as well). In these articles, the process of overshooting and catching errors will be explained in detail, but for the impatient: a brief summary of all articles about forwarding exceptions in gcc / x86:

When we write a throw statement, the compiler translates it into a couple of calls to the libstdc++ functions, which place an exception and begin the fast process of unwinding the stack by calling the libstdc library.
For each catch block, the compiler writes some specific information after the method body, an exception table that the method can catch, as well as a cleanup table (more on the cleanup table later).
In the process of unwinding the stack, a special function is called, supplied by libstdc++ (called the "personality routine"), which checks each function in the stack for errors it can catch.
If there was no one who could catch this error, std::terminate called.
If someone nevertheless is found, promotion starts again from the top of the stack.
When you re-walk the stack, a “personal function” is launched to clean up resources for each method.
The routine checks the cleanup table for the current method. If it has something to clear, the subroutine "jumps" into the current frame of the stack and runs a cleanup code that calls the destructors for each of the objects located in the current scope.
When the promotion stumbles upon a fragment of the stack that can handle the exception, it "jumps" into the exception handling block.
After the completion of exception processing, the cleanup function is called to free the memory occupied by the exception.

* We will have one big article, beaten into pieces, so further the “series of articles” will be replaced with just the “article” in order not to overload it.

Even now it looks difficult, but we didn’t even begin, it was just a short and inaccurate description of the difficulties needed to handle exceptions.

To explore all the details that take place under the hood, in the next part we will start with the implementation of our own mini-version of libstdlibc++ . Not all, only parts with error handling. In reality, not even this whole part, only the necessary minimum for the implementation of the throw / catch block. You will also need a little assembler, but only quite a bit. But it will take a lot of patience, unfortunately.

If you're too curious, you can start here . This is a complete specification of what we will implement in the following sections. I will try to make this article instructive and simpler, so that next time you will be easier to start with your own ABI (application binary interface).

Notes (disclaimer):
I am in no way confusing what kind of voodoo-magic happens when an exception is thrown. In this article I will try to expose the secret and find out how it works. Some little things and subtleties will not be true. Please let me know if there is something wrong somewhere.

Note translator: this is also relevant for translation.

C ++ exceptions under the hood: small ABI

If we try to understand why exceptions are so complicated and how they work, we can either drown in tons of manuals and documentation, or try to catch exceptions on our own. In fact, I was surprised by the lack of quality information on the topic (note of the translator - I, by the way, too): everything that can be found either too detailed or too simple. Of course, there are specifications (most documented: ABI for C ++ , but also CFI , DWARF, and libstdc), but reading documentation is not enough if you really want to understand what is going on inside.

Let's start with the obvious: re-inventing the wheel! We know that in pure C there are no exceptions, so let's try linking the C ++ program with the pure C linker and see what happens! I started with something simple like this:

 #include "throw.h" extern "C" { void seppuku() { throw Exception(); } }

Do not forget extern , otherwise G ++ will helpfully cut out our small function and make it impossible to link with our program on pure C. Of course, we need a header file for linking (not a pun) in order to make it possible to connect the C ++ and C worlds:

 struct Exception {}; #ifdef __cplusplus extern "C" { #endif void seppuku(); #ifdef __cplusplus } #endif

And a very simple main:

 #include "throw.h" int main() { seppuku(); return 0; }

What happens if we try to compile and link this frankino code?

 > g++ -c -o throw.o -O0 -ggdb throw.cpp > gcc -c -o main.o -O0 -ggdb main.c

Note: you can download all the source code for this project from my git repository .

So far, so good. Both g ++ and gcc are happy in their little world. Chaos will begin as soon as we try to link them together:

 > gcc main.o throw.o -o app throw.o: In function `foo()': throw.cpp:4: undefined reference to `__cxa_allocate_exception' throw.cpp:4: undefined reference to `__cxa_throw' throw.o:(.rodata._ZTI9Exception[typeinfo for Exception]+0x0): undefined reference to `vtable for __cxxabiv1::__class_type_info' collect2: ld returned 1 exit status

And of course, gcc complains about missing C ++ declarations. These are very specific C ++ declarations. Look at the last line of the error: the vtable for cxxabiv1 . cxxabi , declared in libstdc++ , refers to ABI for C ++. We now know that error handling is performed using a standard C ++ library with a declared C ++ ABI interface.

C ++ ABI declares a standard binary format with which we can link objects together in one program. If we compile .o files with two different compilers that use different ABIs, we cannot combine them into one application. The ABI may also declare various other standards, such as an interface for unwinding a stack or throwing an exception. In this case, the ABI defines the interface (not necessarily the binary format, just the interface) between C ++ and other libraries in our application that provide stack promotion. In other words, ABI defines C ++ specific things that allow our application to communicate with non-C ++ libraries: this is what will allow exceptions to be thrown from other languages to be captured in C ++, and so many other things.

In any case, linker errors are the starting point and the first layer in analyzing the work of exceptions under the hood: the interface we need to implement is cxxabi . In the next chapter, we will start with our own mini-ABI, defined exactly as C ++ ABI .

C ++ exceptions under the hood: please the linker by punching him ABI

In our journey in understanding exceptions, we discovered that all weightlifting is implemented in libstdc++ , the definition of which is given in C ++ ABI. Looking through the linker errors, we deduced that in order to handle errors, we have to ask for help from C ++ ABI; we created a spitting C ++ program error, linked it to a pure C program, and found that the compiler somehow translates our throw statements into something that now calls several libstd ++ functions that directly throw an exception.

Nevertheless, we want to understand exactly how exceptions work, so let's try to implement our own mini-ABI, which provides a mechanism for forwarding errors. To do this, we need only RTFM , but the full interface can be found here for LLVM . Recall what specific features are missing:

 > gcc main.o throw.o -o app throw.o: In function `foo()': throw.cpp:4: undefined reference to `__cxa_allocate_exception' throw.cpp:4: undefined reference to `__cxa_throw' throw.o:(.rodata._ZTI9Exception[typeinfo for Exception]+0x0): undefined reference to `vtable for __cxxabiv1::__class_type_info' collect2: ld returned 1 exit status

__cxa_allocate_exception

The name is self-sufficient, I suppose. __cxa_allocate_exception accepts size_t and allocates enough memory to hold the exception while it is being forwarded. This is more complicated than it seems: when an error is processed, some kind of magic happens with the stack, allocating it (note the translator - excuse me for this word, but sometimes I will use it) on the stack is a bad idea. Heap memory allocation is also generally a bad idea, because where will we allocate memory when an exception is signaled that the memory has run out? Static (static) placement in memory is also a bad idea, as long as we need to make it thread-safe (otherwise, two competing streams that would throw exceptions would be catastrophic). Considering these problems, memory allocation in the local stream storage (heap) seems to be the most advantageous, however, if necessary, contact emergency storage (presumably static) if the memory is out of memory. We, of course, will not worry about the scary details, so we can simply use a static buffer if necessary.

__cxa_throw

This feature does all the magic of forwarding! According to the ABI, once an exception has been created, __cxa_throw should be called . This function is responsible for invoking stack promotion. Important effect: __cxa_throw never assumes a return (return). It also passes control to the appropriate catch block to handle the exception or calls (by default) std::terminate , but never returns anything.

`vtable` for `cxxabiv1::class_type_info`

Strange ... __class_type_info is clearly some kind of RTTI (run-time type information, run-time type identification, Dynamic data type identification), but which one? For now, it’s not easy for us to answer this, and it’s not hellishly important for our mini-ABI; let's leave this part of the "application", which we give after completing the analysis of the process of throwing an exception, now let's just say that this is the entry point of the ABI definition in runtime, answering the question: "these two types are the same or not." This is a function that is called to determine whether a given catch block can handle this error or not. Now we will focus on the main thing: we need to give it as an address for the linker (i.e., it is not enough to define it, we still need to initiate it) and it should have a vtable (yes, yes, it should have a virtual method).

A lot of work happens in these functions, but let's try to implement the simplest exception thrower: the one that will make the call exit when an exception is thrown. Our application is almost complete, but some ABI functions are missing, so let's create mycppabi.cpp. By reading our ABI specification , we can describe our signatures for __cxa_allocate_exception and __cxa_throw :

 #include <unistd.h> #include <stdio.h> #include <stdlib.h> namespace __cxxabiv1 { struct __class_type_info { virtual void foo() {} } ti; } #define EXCEPTION_BUFF_SIZE 255 char exception_buff[EXCEPTION_BUFF_SIZE]; extern "C" { void* __cxa_allocate_exception(size_t thrown_size) { printf("alloc ex %i\n", thrown_size); if (thrown_size > EXCEPTION_BUFF_SIZE) printf("Exception too big"); return &exception_buff; } void __cxa_free_exception(void *thrown_exception); #include <unwind.h> void __cxa_throw( void* thrown_exception, struct type_info *tinfo, void (*dest)(void*)) { printf("throw\n"); // __cxa_throw never returns exit(0); } } // extern "C"

Let me remind you: you can find the source in my github repository .

If we now compile mycppabi.cpp and link to the other two .o files, we will get working binaries, which should output "alloc ex 1 \ n throw" and, after that, exit. Very simple, but surprisingly: we manage exceptions without calling libc ++: we wrote a (very very small) part of C ++ ABI!

Another important part of the wisdom we received when creating our own mini-ABI: the throw keyword is compiled into two function calls from libstdc ++. There is no voodoo magic here, this is a simple transformation. We can even disassemble our function to test it. Run g++ -S throw.cpp

 seppuku: .LFB3: [...] call __cxa_allocate_exception movl $0, 8(%esp) movl $_ZTI9Exception, 4(%esp) movl %eax, (%esp) call __cxa_throw [...]

Even more magic: when throw translated to these two calls, the compiler does not even know how the exception will be handled. As soon as libstdc++ determines __cxa_throw and its friends, libstdc++ dynamically linked in runtime, the exception handling method can be selected when you first start the application.

We are already seeing progress, but we still have to go a long way of learning. Now our ABI can only throw exceptions. Can we extend it to catch errors? Well, let's see how to do this in the next chapter!

C ++ exceptions under the hood: catching what we throw

In this article, we slightly lifted the veil of secrecy about exceptions throwing, watching for compiler and linker errors, but we are still far from understanding anything about catching errors. We summarize what we have already found out:

The throw declaration will be translated by the compiler into two calls: __cxa_allocate_exception and __cxa_throw .
__cxa_allocate_exception and __cxa_throw live in libstdc++ .
__cxa_allocate_exception allocates memory for a new exception.
__cxa_throw performs the preparation and throws an exception in _Unwind , into the set of functions that live in libstdc and produces a real stack unfolding ( ABI defines the interface of these functions).

Until now, it has been quite simple, but catching exceptions is a bit more complicated, especially because it requires a little reflexion (it allows the program to analyze its own code). Let's use our old method and add some catch block to our code, compile it and see what happens:

 #include "throw.h" #include <stdio.h> //     struct Fake_Exception {}; void raise() { throw Exception(); } // ,  ,      catch- void try_but_dont_catch() { try { raise(); } catch(Fake_Exception&) { printf("Running try_but_dont_catch::catch(Fake_Exception)\n"); } printf("try_but_dont_catch handled an exception and resumed execution"); } //   ,   void catchit() { try { try_but_dont_catch(); } catch(Exception&) { printf("Running try_but_dont_catch::catch(Exception)\n"); } catch(Fake_Exception&) { printf("Running try_but_dont_catch::catch(Fake_Exception)\n"); } printf("catchit handled an exception and resumed execution"); } extern "C" { void seppuku() { catchit(); } }

As before, we have a seppuku function that connects the C and C ++ worlds, only this time we added several function calls to make our stack more interesting, we also added branches of try / catch blocks, so now we can analyze how libstdc ++ processes their.

And again we get linker errors on missing ABI-functions:

 > g++ -c -o throw.o -O0 -ggdb throw.cpp > gcc main.o throw.o mycppabi.o -O0 -ggdb -o app throw.o: In function `try_but_dont_catch()': throw.cpp:12: undefined reference to `__cxa_begin_catch' throw.cpp:12: undefined reference to `__cxa_end_catch' throw.o: In function `catchit()': throw.cpp:20: undefined reference to `__cxa_begin_catch' throw.cpp:20: undefined reference to `__cxa_end_catch' throw.o:(.eh_frame+0x47): undefined reference to `__gxx_personality_v0' collect2: ld returned 1 exit status

We again see a bunch of interesting things. We expected the __cxa_begin_catch and __cxa_end_catch call, although we don’t know what they are, but we can assume that they are equivalent to throw / __ cxa_allocate / throw . __gxx_personality_v0 - something new, and it will be the main theme of the following parts.

What does the personal function do? (when a translator did not come up with a better name, tell me in the comments if you have ideas). We have already said something about it in the introduction, but next time we will look at it in much more detail, as well as our two new friends: __cxa_begin_catch and __cxa_end_catch .

C ++ exceptions under the hood: magic around __cxa_begin_catch and __cxa_end_catch

After studying how exceptions are thrown, we find ourselves on the path of studying how they are caught. In the previous chapter, we added a try-catch block to our sample application to see what the compiler does, and also got linker errors just like the last time we looked at what would happen if we added a throw block. Here is what the linker writes:

 > g++ -c -o throw.o -O0 -ggdb throw.cpp > gcc main.o throw.o mycppabi.o -O0 -ggdb -o app throw.o: In function `try_but_dont_catch()': throw.cpp:12: undefined reference to `__cxa_begin_catch' throw.cpp:12: undefined reference to `__cxa_end_catch' throw.o: In function `catchit()': throw.cpp:20: undefined reference to `__cxa_begin_catch' throw.cpp:20: undefined reference to `__cxa_end_catch' throw.o:(.eh_frame+0x47): undefined reference to `__gxx_personality_v0' collect2: ld returned 1 exit status

Let me remind you that you can get the code on my git repository .

In theory (in our theory, of course), the catch block is translated into a pair of __cxa_begin_catch / end_catch from libstdc ++, but also into something new, called a personal function , about which we still know nothing.

Let's test our theory about __cxa_begin_catch and __cxa_end_catch . Compile throw.cpp with the -S flag and analyze the assembler code. There is a lot of interesting things, we will cut down to the most necessary:

 _Z5raisev: call __cxa_allocate_exception call __cxa_throw

Everything is going great: we got the same definition for raise (), just throw an exception:

 _Z18try_but_dont_catchv: .cfi_startproc .cfi_personality 0,__gxx_personality_v0 .cfi_lsda 0,.LLSDA1

The definition for try_but_dont_catch () is truncated by the compiler. This is something new: a link to __gxx_personality_v0 and something else called LSDA . This seems like a minor definition, but in reality it’s very important:

The linker uses this to specify CFI (call frame information); CFI stores call frame information, here is its complete specification. It is used mainly for unwinding the stack.
LDSA (language specific data area) is a special area for each language used by a personal function to know which exceptions can be handled by this function.

About CFI and LSDA we will talk in the next chapter, do not forget about them, but now let's move on.

 [...] call _Z5raisev jmp .L8

Another elementary: just call raise and after that jump to L8; L8 does a normal return from a function. If raise fails correctly, then execution (somehow, we still do not know how!) Should not continue on the next instruction, but go to the exception handler (which is called landing pads in ABI terms, more on that later).

  cmpl $1, %edx je .L5 .LEHB1: call _Unwind_Resume .LEHE1: .L5: call __cxa_begin_catch call __cxa_end_catch

At first glance, this piece is a bit complicated, but in reality everything is simple.The greatest amount of magic happens here: first we check if we can handle this exception, if not, we call _Unwind_Resume, if we can, we call __cxa_begin_catchand __cxa_end_catchthen the function should continue normally and thus L8 will be executed (L8 right under our catch block ):

 .L8: leave .cfi_restore 5 .cfi_def_cfa 4, 4 ret .cfi_endproc

Just a normal function return ... with some CFI garbage in it.

This is all for error handling, however, we still do not know how __cxa_begin / end_catch work ; we have ideas for how this pair forms what the landing pad calls - the place in the function where the exception handlers are located. What we do not know yet is how landing pads are searched. Unwind should somehow go through all the calls on the stack, check if there is any call (stack frame for accuracy) a valid block with a landing pad that can handle this exception, and continue execution there.

This is an important achievement, and we will find out how this works in the next chapter.

C ++ exceptions under the hood: gcc_except_table and personal function

Earlier, we found out that throw is translated into __cxa_allocate_exception / throw , and the catch block is translated to __cxa_begin / end_catch , as well as to something called CFI (call frame information) for searching for landing pads error handlers.

What we don't know so far is how _Unwind finds out where this landing pads are. When an exception is thrown through a bunch of functions in the stack, all CFIs allow the stack deployment program to find out what function is currently being executed, and also it is necessary to find out which of the landing pads functions allows us to handle this exception (and, by the way, we ignore functions with multiple try / catch blocks!).

To find out where this landing pads is located, use something that calls itself gcc_except_table . This table can be found (with CFI garbage) after the end of the function:

 .LFE1: .globl __gxx_personality_v0 .section .gcc_except_table,"a",@progbits [...] .LLSDACSE1: .long _ZTI14Fake_Exception

This section .gcc_except_table - where all the information for detecting landing pads is stored, we will talk about this later when we analyze the personal function. For now, we just say that LSDA means a zone with language-specific data that the personal function checks for landing pads for the function (it is also used to launch destructors in the process of expanding the stack).

To summarize: for each function where there is at least one catch block, the compiler translates it into a couple of calls to cxa_begin_catch / cxa_end_catch and then the personal function called __cxa_throw , reads gcc_except_tablefor each method in the stack to search for something called LSDA. The personal function then checks if there is a block in LSDA that handles this exception, as well as if there is any cleanup code (which runs the destructors when needed).

We can also make an interesting conclusion: if we use nothrow (or an empty throw statement), the compiler can omit gcc_except_tablefor the method. This way of implementing exceptions in gcc, which does not greatly affect performance, in fact greatly influences the size of the code. What about catch blocks? If an exception is thrown when the nothrow specifier is declared, LSDA is not generated and the personal function does not know what to do. When a personal function does not know what to do, it calls the default error handler, which, in most cases, means that the error from the nothrow method ends with std :: terminate.

Now that we have an idea of what a personal function does, can we implement it? Well, let's see!

Continuation

Source: https://habr.com/ru/post/279111/

All Articles