Continuing to translate a series of articles about exception handling in C ++.
1 partPart 2C ++ exceptions under the hood: finding the right landing pad
This is the 15th chapter in our long history. We have already studied quite a lot about how exceptions work, and even have a written, working personal function with a small amount of reflection that determines where the catch-block is located (landing pad in terms of exceptions). In the last chapter, we wrote a personal function that can handle exceptions, but it always substitutes only the first landing pad (that is, the first catch block). Let's improve our personalized function by adding the ability to choose the correct landing pad in functions with several catch blocks.
Following the TDD (test driven development) mode, we can first build a test of our ABI. Improve our program, throw.cpp, make several try / catch blocks:
')
#include <stdio.h> #include "throw.h" struct Fake_Exception {}; void raise() { throw Exception(); } void try_but_dont_catch() { try { printf("Running a try which will never throw.\n"); } catch(Fake_Exception&) { printf("Exception caught... with the wrong catch!\n"); } try { raise(); } catch(Fake_Exception&) { printf("Caught a Fake_Exception!\n"); } printf("try_but_dont_catch handled the exception\n"); } void catchit() { try { try_but_dont_catch(); } catch(Fake_Exception&) { printf("Caught a Fake_Exception!\n"); } catch(Exception&) { printf("Caught an Exception!\n"); } printf("catchit handled the exception\n"); } extern "C" { void seppuku() { catchit(); } }
Before testing, try to think what will happen in the process of running this test? Focus on the try_but_dont_catch function: the first try / catch block never throws an exception, the second one is forwarding without catching it. As long as our ABI is a little dumb, the first catch block will handle the exception of the second block. But what happens after the first catch is processed? Execution will continue from the point where the first catch / try ends, again right before the second try / catch block, which again throws an exception, the first handler will again handle it, and so on. Endless cycle! Well, we again got a very difficult while (true)!
We use our knowledge of the start / length fields in the LSDA table to correctly select our landing pad. To do this, we need to know what the IP was when the exception was thrown, and we can figure it out with the
Unwind function already known to us:
_Unwind_GetIP . In order to understand what _Unwind_GetIP returns, let's see an example:
void f1() {} void f2() { throw 1; } void f3() {} void foo() { L1: try{ f1(); } catch(...) {} L2: try{ f2(); } catch(...) {} L3: try{ f3(); } catch(...) {} }
In this case, our personal function will be called in the catch block for f2, and the stack will look like this:
+------------------------------+ | IP: f2 stack frame: f2 | +------------------------------+ | IP: L3 stack frame: foo | +------------------------------+
Note that the IP will be set to L3, although an exception is thrown in L2. This is because the IP indicates the next instruction that should have been executed. It also means that we have to subtract one if we want to get an IP where an exception was thrown, otherwise the result from _Unwind_GetIP may be outside the landing pad. Let's return to our personalized function:
_Unwind_Reason_Code __gxx_personality_v0 ( int version, _Unwind_Action actions, uint64_t exceptionClass, _Unwind_Exception* unwind_exception, _Unwind_Context* context) { if (actions & _UA_SEARCH_PHASE) { printf("Personality function, lookup phase\n"); return _URC_HANDLER_FOUND; } else if (actions & _UA_CLEANUP_PHASE) { printf("Personality function, cleanup\n"); // -- IP // , uintptr_t throw_ip = _Unwind_GetIP(context) - 1; // LSDA LSDA_ptr lsda = (uint8_t*)_Unwind_GetLanguageSpecificData(context); // LSDA LSDA_Header header(&lsda); // LSDA CS LSDA_CS_Header cs_header(&lsda); // LSDA CS const LSDA_ptr lsda_cs_table_end = lsda + cs_header.length; // CS while (lsda < lsda_cs_table_end) { LSDA_CS cs(&lsda); // LP, , if (not cs.lp) continue; uintptr_t func_start = _Unwind_GetRegionStart(context); // IP lp // LP , // IP uintptr_t try_start = func_start + cs.start; uintptr_t try_end = func_start + cs.start + cs.len; // : LP try if (throw_ip < try_start) continue; if (throw_ip > try_end) continue; // landing pad ; int r0 = __builtin_eh_return_data_regno(0); int r1 = __builtin_eh_return_data_regno(1); _Unwind_SetGR(context, r0, (uintptr_t)(unwind_exception)); // , ; // _Unwind_SetGR(context, r1, (uintptr_t)(1)); _Unwind_SetIP(context, func_start + cs.lp); break; } return _URC_INSTALL_CONTEXT; } else { printf("Personality function, error\n"); return _URC_FATAL_PHASE1_ERROR; } } }
As usual: the actual sample code
by reference .
Run again and voila! No more endless cycles! Simple changes allowed us to choose the right landing pad. Next, we will try to teach our personal function to select the correct stack frame instead of the first one.
C ++ exceptions under the hood: finding the right catch block in the landing pad
We have already written a personal function that can handle functions with more than one landing pad. Now we will try to recognize which block can handle certain exceptions, in other words, which catch block to call us.
Of course, finding out which block can handle an exception is not an easy task. However, were you really waiting for something else? The main problems right now are:
- First and foremost: where and how we can find the accepted exception types by this catch block.
- Even if we can find the catch type, how can we handle catch (...)?
- For a landing pad with several catch blocks, how can we know all the possible catch types?
- Take a look at an example:
struct Base {}; struct Child : public Base {}; void foo() { throw Child; } void bar() { try { foo(); } catch(const Base&){ ... } }
We must check not only whether the current Landing Pad can accept the current exception, but also all its parents!
Let's make our task a bit simpler: we will work with landing pads with only one catch block, and also say that we do not have inheritance. However, how do we find landing pad types?
In general, this is in the .gcc_except_table part that we have not yet analyzed: the action table. Disassemble at throw.cpp and see what is there, right after the call site table, for our "try but dont catch" function:
LLSDACSE1: .byte 0x1 .byte 0 .align 4 .long _ZTI14Fake_Exception .LLSDATT1:
It does not seem that there is a lot of information, but there is a promising pointer to something that has the name of our exception. Let's look at the definition of _ZTI14Fake_Exception:
_ZTI14Fake_Exception: .long _ZTVN10__cxxabiv117__class_type_infoE+8 .long _ZTS14Fake_Exception .weak _ZTS9Exception .section .rodata._ZTS9Exception,"aG",@progbits,_ZTS9Exception,comdat .type _ZTS9Exception, @object .size _ZTS9Exception, 11
We found something very interesting! Can you recognize it? This is std :: type_info for the Fake_Exception structure!
Now we know that there is a way to get a pointer to a kind of reflection for our exclusion. Can we programmatically find this? Let's look further.
C ++ exceptions under the hood: reflection type exceptions and reading .gcc_except_table
Now we know where we can get a lot of exception information by reading the local data store .gcc_except_table; what we have to implement in the personal function to determine the correct landing pad.
We left our ABI implementation and plunged into examining the assembler for .gcc_except_table to see how we can find the types of exceptions that we can handle. We found that part of the table contains a list of types with the information we need. We will read this information in the cleanup phase, but first let's recall the definition of our LSDA header:
struct LSDA_Header { uint8_t start_encoding; uint8_t type_encoding;
The last field for us is new: it indicates the offset for the type table. Recall also the definition of each call:
struct LSDA_CS { // uint8_t start; // , uint8_t len; // Landing pad uint8_t lp; // action table + 1 (0 " ") uint8_t action; };
Look at the last field, "action". This is the offset in the action table. This means that we can find an action for a specific CS (call site). The trick is that for landing pads, in which there are catch blocks, the action contains an offset to the type table, now we can use the offset to get the type table that we can get from the headers! Enough talk, better look at the code:
// LSDA LSDA_ptr lsda = (uint8_t*)_Unwind_GetLanguageSpecificData(context); // LSDA LSDA_Header header(&lsda); const LSDA_ptr types_table_start = lsda + header.type_table_offset; // LSDA CS LSDA_CS_Header cs_header(&lsda); // LSDA CS const LSDA_ptr lsda_cs_table_end = lsda + cs_header.length; // action tables const LSDA_ptr action_tbl_start = lsda_cs_table_end; // call site LSDA_CS cs(&lsda); // cs.action -- offset + 1; cs.action == 0 // const size_t action_offset = cs.action - 1; const LSDA_ptr action = action_tbl_start + action_offset; // landing pad catch the action table // index int type_index = action[0]; // types_table_start , // type_index. ptr // std::type_info, catch- const void* catch_type_info = types_table_start[ -1 * type_index ]; const std::type_info *catch_ti = (const std::type_info *) catch_type_info; // , - Fake_Exception printf("%s\n", catch_ti->name());
This code looks complicated because of several consecutive indirect addresses before obtaining the type_info structure, but in practice it does not do anything difficult, it only reads the .gcc_except_table that we found during disassembling.
Finding the type of exception is a big step in the right direction. Also, our personal function becomes a little piled up. Most of the difficulties of reading LSDA can be hidden under the carpet, it should not be very expensive (meaning - to be a separate function).
Next, we will learn how to compare the type of exception handled with the type that is thrown.
C ++ exceptions under the hood: getting the correct stack frame
Our latest version of the personal function knows where the information is stored on whether this exception can be handled or not (although it works only for one catch block in one try / catch block, well, also without inheritance), but to make it useful - First, we will learn to check whether the exception by type matches the one we can handle.
Of course, we first need to know the type of exception. For this, we need to write it down when
__cxa_throw is
called :
void __cxa_throw(void* thrown_exception, std::type_info *tinfo, void )
And now we can read the type of the exception in our personal function and just compare the type match (the exception names are C ++ strings, so the simple "==" is enough):
// const void* catch_type_info = lsda.types_table_start[ -1 * type_index ]; const std::type_info *catch_ti = (const std::type_info *) catch_type_info; // __cxa_exception* exception_header = (__cxa_exception*)(unwind_exception+1) - 1; std::type_info *org_ex_type = exception_header-&gt;exceptionType; printf("%s thrown, catch handles %s\n", org_ex_type->name(), catch_ti->name()); // : // if (org_ex_type->name() != catch_ti->name()) continue;
Look
at the gita recent changes.
Hmm, of course we have a problem, can you find it yourself? If an exception is thrown in two phases and in the first we want to process it, the second time we cannot say that we do not want to process it again. I don’t know, _Unwind handles this case, there’s no documentation about it, there’s likely to be some unspecified behavior, so just saying that we’re processing everything is not enough.
As long as we taught the personal function to find out which landing pad can handle an exception, we lied to
Unwind about which exception can be handled, instead we say that we process them all in our ABI 9. The truth is that we don’t know - we can Do we process it. It's easy to fix: we can do something like this:
_Unwind_Reason_Code __gxx_personality_v0 (...) { printf("Personality function, searching for handler\n"); // ... foreach (call site entry in lsda) { if (call site entry.not_good()) continue; // landing pad ; // , _Unwind_, if (actions & _UA_SEARCH_PHASE) return _URC_HANDLER_FOUND; // , _UA_CLEANUP_PHASE /* */ return _URC_INSTALL_CONTEXT; } return _URC_CONTINUE_UNWIND; }
What do we get if we launch our personalized function? The fall! Who would doubt that. Remember our falling function? Here is what our exception should catch:
void catchit() { try { try_but_dont_catch(); } catch(Fake_Exception&) { printf("Caught a Fake_Exception!\n"); } catch(Exception&) { printf("Caught an Exception!\n"); } printf("catchit handled the exception\n"); }
Unfortunately, our personal function checks only the first type of errors that the landing pad can handle. If we remove the Fake_Exception catch block and try again: everything will finally work correctly! Our personal function can select the correct catch block in the correct frame, supplied by the try-catch block with a single catch block.
In the next chapter we will improve it again!
C ++ exceptions under the hood: choose the correct catch from the landing pad
The 19th chapter on exceptions in C ++: we wrote a personal function that can read LSDA, choose the right landing pad, the right stack frame to handle the exception, but still find it difficult to find the right catch branch. For the final version of the working personal function, we have to check the types of exceptions in the entire action table
.gcc_except_table .
Remember the action table? Let's look at it again, but now with several catch blocks:
# Call site table .LLSDACSB2: # Call site 1 .uleb128 ip_range_start .uleb128 ip_range_len .uleb128 landing_pad_ip .uleb128 (action_offset+1) => 0x3 # Rest of call site table # Action table start .LLSDACSE2: # Action 1 .byte 0x2 .byte 0 # Action 2 .byte 0x1 .byte 0x7d .align 4 .long _ZTI9Exception .long _ZTI14Fake_Exception .LLSDATT2: # Types table start
If we are going to read all the exceptions supported by the landing pad in this example (this LSDA for the catchit function, by the way), we need to do something like this:
- Get the displacement of the action from the call table (do not forget, we read offset + 1, and 0 means no action)
- Go to action 2 by offset, get a type index 1. The type table is indexed in the reverse order (i.e. we have a pointer to its end and should get access using -1 * index)
- Go to types_table [-1] to get type_info for Fake_Exception
- Fake_Exception is not the exception that was thrown, we get the offset to the next action (action) (0x7d)
- Reading 0x7d in uleb128 returns -3, which from the position from which we read the offset, is three steps back
- Reading type with index 2
- Getting type_info to exclude Exception, which this time matches the forward one, so that we can set the landing pad!
It looks complicated, as long as we have a lot of indirect addressing again, but you can see the final code
in the repository . Under the link you will find a bonus in the form of a personal function that can read the type table, determine which catch block we need (if the type is null, the block can handle all exceptions in a row). There is a funny side effect: we can handle errors only thrown out of C ++ programs.
Finally, we know how exceptions are thrown, how the stack is unwound, how the personal function selects the correct stack frame and which catch block to choose inside the landing pad, but we still have a problem: launching the destructors. Well, further we will change our personal function, having provided support of RAII.
C ++ exceptions under the hood: launch destructors in promotion
Our
mini-ABI 11 version can do almost all of the basic possibilities in exception handling, but it still cannot run the destructor. This is a very important part if we want to write secure code. We know that the necessary distrkutory stored .gcc_except_table, so we need to look at the assembler code a little more.
# Call site table .LLSDACSB2: # Call site 1 .uleb128 ip_range_start .uleb128 ip_range_len .uleb128 landing_pad_ip .uleb128 (action_offset+1) => 0x3 # Rest of call site table # Action table start .LLSDACSE2: # Action 1 .byte 0 .byte 0 # Action 2 .byte 0x1 .byte 0x7d .align 4 .long _ZTI14Fake_Exception .LLSDATT2: # Types table start
In the usual landing pad, when action has a type with an index greater than 0, we can get an index into a type table and can use its search for the required catch block. Otherwise, when the index is 0, we need to run the cleanup code. Even if the landing pad cannot handle exceptions, it is still able to perform cleanup during spinup. Of course, the landing pad must call _Unwind_Resume after the cleanup is complete in order to continue the spinup process.
I downloaded a new and latest version of the code into
my githab repository , but I have bad news: remember our cheating when we said that uleb128 == char? When we started adding code for destructors, the offsets in the .gcc_except_table become large (by "large" I mean they are more than 127) and our trick no longer works.
For the next version, we should rewrite our LSDA reader so that it correctly processes the uleb128 code.
Even in spite of this, we have reached our goal! They wrote a mini-ABI that can correctly handle exceptions without the help of the libcxxabi library!
Of course, there is still something to do, for example, to handle exceptions that are not native for this language, compatibility support between compilers and linkers. Maybe sometime later ...
C ++ exceptions under the hood: results
After 20 chapters on low-level exception handling, it's time to take stock! What have we learned about how exceptions are thrown and how they are caught?
Leaving aside the scary details about reading .gcc_except_table, which is probably the largest part of this article, we can conclude:
- The C ++ compiler actually does very little work related to exception handling, most of the magic happens in libstdc ++
- Here are a few things the compiler does:
- Creates CFI information for stack promotion.
- It creates something called .gcc_except_table with information about landing pads (try / catch blocks). Part of the reflection.
- When we write a throw, the compiler translates this into a couple of calls to libstdc ++, which allocate the exception and then run the promotion.
- When an exception is thrown at runtime, __cxa_throw delegates the promotion of the stack to the library libstdc
- In the process of stack promotion, a special function is called, supplied by the libstdc ++ (called the personal function, personality routine), which checks each function in the stack if it can handle the exception.
- If no matching catch block is found, the std :: terminate is called.
- If found, stack promotion starts again from the beginning of the stack.
- During the second pass, cleaning is performed.
- The personal function checks the .gcc_except_table for the current method. If it (the table) contains actions to clean up, the personal function "jumps" into the current frame of the stack to start clearing this method.
- As soon as the advertiser came across a stack frame (consider the function) that can handle this exception, he jumps into the appropriate catch block
- After the catch block is executed, the memory occupied by the exception is cleared.
Having studied in detail how exceptions are handled, we are now able to say why it is so difficult to write an exception safe code.
With a superficial glance, exceptions may seem nice and simple, but it’s only worth digging a little deeper as we stumble upon a bunch of difficulties, the program literally starts digging into itself (reflection), which is not typical of C ++ applications.
Even if we are talking about high-level languages, when an exception is thrown, we cannot rely on our understanding of the normal execution of code: it is usually performed linearly with small ramifications in the form of if and switch statements. With exceptions, everything is different: the code starts to run in an incomprehensible order, unfold the stack, interrupt the execution of functions, and ceases to follow the usual rules. The pointer to the instruction changes in each landing pad, the stack spins up without our control, in general, a lot of magic happens under the hood.As a result, exceptions are difficult because they break our understanding of the natural execution of the program. This does not mean that we are strictly forbidden to use them, but only suggests that we must always be careful when working with them!