Exceptions and the associated stack promotion is one of the most enjoyable techniques in C ++. Exception handling is intuitively consistent with the block structure of the program. Externally, exception handling seems very logical and natural.
Careful use of stack objects allows you to create very efficient and safe code, where, unlike systems with garbage collection, reference locality is preserved, which makes it possible to reduce the number of calls to the system behind the memory, reduce its fragmentation, more efficiently use the cache memory.
However, in C ++, exceptions are traditionally considered literally as exceptions to error recovery. It is difficult to say whether this is a cause or a consequence of the fact that the implementation of exception handling by compilers is extremely expensive. Let's try to figure out why.
')
Continued
here .
How are things.
There are two approaches to implementing exception handling:
- The first can be characterized by the words - “let the loser pay”. The compiler tries to minimize costs in cases where exceptions do not occur. Ideally, the program does not bear any additional load, all the necessary information is located aside from the code in a form suitable for promotion.
- The second strategy is “gradually pay everything and always.” In other words, the program incurs certain costs in the process of maintaining the relevance of the information necessary for the correct promotion of the stack. In this case, in case of an exception, promotion of the stack is cheaper.
A few examples:
- GCC / SJLJ . SJLJ is short for setjmp / longjmp. It refers rather to the first approach, but thanks to the control transfer scheme, it also has a fixed fee for each try . In essence, this implementation has absorbed the worst that is inherent in both approaches. Up until the fourth version, this was the main exception handling option.
As the name implies, control is transferred via a call to longjmp, each try generates a call to setjmp. The corresponding buffer is allocated in the stack for each try block.
At the beginning of each function, a prolog is created that registers the current frame in the context stack. Similarly, an epilogue is created that removes the current context from the top of the context stack. Next to each function, an auxiliary code is created to clear the resources.
More precisely, the compiler each return from a function, potentially capable of completing an exception, enters the tree as a key, the value is a pointer to the cleanup code that must be taken at this point to clear the function context. The linker assembles the pieces of these trees for each link module into a single project tree (very roughly). This miracle is called LSDA (language specific data area) and it is located in the ".gcc_except_table" section.
When an exception occurs, based on the type_info of the exception being thrown, a block is searched for that can handle the exception. Starting from the current context and up to the handler context (using navigation through call frames), the addresses of the code that (depending on the scope of local variables) must be executed in this place are retrieved and executed. Then control is transferred.
There is a prejudice that this method is very expensive. Already due to the fact that for each try-block setjmp is called, which is not cheap. In fact, you need to fully save the state of the processor, where there may be dozens of registers. Whereas at the time of the exception, the contents of most of these registers are already useless. In fact, the compiler does quite rationally. He deploys setjmp, and saves only useful registers (he already has this information). The author doubts that the cost of setjmp is so high.
But what really catches your eye is a voluminous auxiliary code, especially in non-trivial cases. The compiler, like YACC, lists all the states of the stack automaton. And, though, the optimizer of opportunities cleans redundancy and the trivial code, what remains is more than enough.
- GCC / DW2 . This is just an example of the first approach to exception handling. DW2 means DWARF2 (now already 3) - the storage format of auxiliary, including debugging information in the executable file. After all, debugging information is also needed so that at any time you can find out the value of any variable, including in the frames of the previous (upper) calls. Therefore, the compiler in the process of generating code postpones information about what it allocates in the stack, in which registers it allocates variables when it stores them ... In fact, this format is not identical to DWARF, although it is very close to it. Standard version for the fourth version of GCC.
Conceptually, at each address of the program code information is stored on how to get into the higher-level call frame. In practice, due to the bulk of this information, it is compressed, in fact, calculated using the interpretation of the byte code. This bytecode is executed when an exception occurs. All this is located in the sections ".eh_frame" and ".eh_frame_hdr".
Yes, among other things, the DWARF interpreter is a great backdoor, with which, by changing the byte code, you can catch the exception and send it for processing to your heart’s content.
GCC / DW2 uses almost the same LSDA section as GCC / SJLJ.
As we see, the costs associated with the promotion of the stack (in the absence of exceptions) are practically absent. However, the cost of initiating an exception is high. In addition, it is impossible not to note the strong integration of the architecture-dependent part of the compiler and its rather high-level layers.
- MS VC ++ . This compiler implements a second processing strategy.
- For each function that can potentially throw exceptions, the compiler creates as a stack variable a structure from a pointer to the previous similar structure, addresses of the handler function and auxiliary data. The address of this structure is entered in FS: [0], which is the top of the stack of these structures. The FS register in Win32 is used as the Thread Information Block ( TIB ) (GS in Win64). A handler function is also created (with its own data set) and an epilogue that restores FS: [0] in the event of a successful completion.
- The compiler creates a table of structures — an element for each try block in a function. Each try block has an index of the beginning and end in this table (the nested block has a nested interval) corresponding to a certain state, the compiler itself keeps track of the relevance of this index. In this way, the compiler implements a stack of try-blocks.
- A catch-table table is created for each try-block. For each type of exception, the type_info table of all base classes in the hierarchy of this type of exception is created.
- For each function, an unwind table is created, each element of which contains a pointer to the function, which releases some resource and the number of the previous element. There can be several chains in the table, depending on the scopes of objects with destructors. At the time of the exception, by the index of the current state, which was mentioned above, you can find the necessary chain and call all the necessary destructors.
- For the x64 version, the auxiliary stack structures were transferred to .pdata whenever possible, probably in MS they consider the first strategy more promising.
- When an exception is triggered, the main work is carried out by the operating system, where control falls through SYSENTER.
This method has the same drawbacks as SJLJ - extensive auxiliary code and low portability. - The process of raising an exception and selecting the appropriate catch block looks about the same everywhere:
- When an exception is raised, its handle is created, which contains a copy of the object, its type_info, a pointer to the destructor
- Going up the stack of try blocks and clearing all registered stack objects, (navigating this stack is different everywhere, but the essence is the same), look through the lists of catch blocks and look for suitable ones.
- If a suitable catch block is found, the exception object becomes a local variable, and we call this block. If the catch block accepts an exception by value, not by reference, a copy is created.
- If there was no exception call, we kill the object - the exception
- "Mistress on the note":
some_exception exc("oioi"); throw exc;
spawns an extra copy / destructor constructor
throw *new some_exception("oioi");
leaks memory
catch(some_exception exc) ...
again an extra call to the constructor and destructor
catch(const some_exception *exc) ...
the exception will fly by, if you do not throw a pointer
throw some_exception("oioi"); ... catch (const some_exception &exc)....
low cost
Details can be found
here ,
here and
here .
What if ...
And, it would seem, all that matters is to call in the right order destructors whose bodies already exist. How did it happen that a simple, in general, problem has such viscous, heavy and, moreover, independently developed solutions? It is difficult to say, it happened so historically.
Let's try to sketch a solution, trying to keep it simple and, whenever possible, architecturally independent.
- First of all, we choose a strategy - this will be the second option.
- Transfer control - setjmp / longjmp
- We create a structure, all descendants of which have the ability to self-register for possible promotion.
struct unw_item_t { unw_item_t (); virtual ~unw_item_t (); void unreg(); unw_item_t *prev_; };
- And another one whose scope is the try block
struct jmp_buf_splice { jmp_buf_splice (); ~jmp_buf_splice (); jmp_buf buf_; jmp_buf_splice *prev_; unw_item_t objs_; };
- For simplicity, we will only throw const char * exceptions using
extern int throw_slice (const char *str);
- Several macros to simulate a try block
- Now we will show the bodies of the jmp_buf_splice class members:
static jmp_buf_splice *root_slice_ = NULL; jmp_buf_splice::jmp_buf_splice () { objs_ = NULL; prev_ = root_slice_; root_slice_ = this; } jmp_buf_splice::~jmp_buf_splice () { root_slice_ = prev_; }
Here is an option for single-threaded implementation. If there are multiple threads, instead of root_slice_, we will need to use TLS, similarly to how GCC does, for example. - It's time for members of the unw_item_t class:
unw_item_t::unw_item_t () { if (NULL != root_slice_) { prev_ = root_slice_->objs_; root_slice_->objs_ = this; } } unw_item_t::~unw_item_t () { unreg(); } unw_item_t::unreg () { if (NULL != root_slice_ && (prev_ != reinterpret_cast<unw_item_t *>(~0))) { root_slice_->objs_ = prev_; prev_ = reinterpret_cast<unw_item_t *>(~0); } }
- Now consider the process of exclusion exception and promotion of the stack:
static int pop_slice () { jmp_buf_splice *sl = root_slice_; assert (NULL != sl); root_slice_ = sl->prev_; return 0; } int throw_slice (const char *str, bool popstate) { if (NULL == str) return -1; jmp_buf_splice *sl = root_slice_; unw_item_t *obj = root_slice_->objs_; while (NULL != obj) { unw_item_t *tmp = obj; obj = obj->prev_; tmp->~unw_item_t (); } if (popstate) pop_slice (); longjmp (sl->buf_, int(str)); return 0; }
Service class - analog std :: auto_ptr: template<typename cl> class deleter_t : public unw_item_t { public: deleter_t (cl *obj){ptr_ = obj;}; virtual ~deleter_t () {delete ptr_;}; private: cl *ptr_; deleter_t (); deleter_t (const deleter_t &); deleter_t &operator= (const deleter_t &); };
Service class - array: template<typename cl> class vec_deleter_t : public unw_item_t { public: vec_deleter_t (cl *obj){ptr_ = obj;}; virtual ~ vec_deleter_t () {delete [] ptr_;}; private: cl *ptr_; vec_deleter_t (); vec_deleter_t (const vec_deleter_t &); vec_deleter_t &operator= (const vec_deleter_t &); };
- Examples
Test class class _A { public: _A():val_(++cnt_){printf ("A::A(%d)\n",val_);} _A(int i):val_(i){printf ("A::A(%d)\n",val_);} virtual ~_A(){printf ("A::~A(%d)\n",val_);} static int cnt_; }; int _A::cnt_ = 0; class A : public unw_item_t, _A {};
Example 1 A a(1); TRY_BLOCK { A b(2); THROW_IN_BLOCK("error\n"); std::cerr << "notreached\n"; } CATCH_BLOCK_FIN { std::cerr << __exc; } FIN_BLOCK;
A :: A (1)
A :: A (2)
A :: ~ A (2)
error
A :: ~ A (1)
Example 2 A a(1); TRY_BLOCK { A b(2); TRY_BLOCK { A c(3); THROW_IN_BLOCK("error\n"); std::cerr << "notreached\n"; } CATCH_BLOCK_FIN { std::cerr << "." << __exc; RETHROW_IN_BLOCK; } FIN_BLOCK; std::cerr << "notreached\n"; } CATCH_BLOCK_FIN { std::cerr << ".." << __exc; } FIN_BLOCK;
A :: A (1)
A :: A (2)
A :: A (3)
A :: ~ A (3)
.error
A :: ~ A (2)
..error
A :: ~ A (1)
Example 3 TRY_BLOCK { vec_deleter_t<_A> da(new _A[3]); TRY_BLOCK { THROW_IN_BLOCK("error\n"); std::cerr << "notreached\n"; } CATCH_BLOCK_FIN { std::cerr << "." << __exc; RETHROW_IN_BLOCK; } FIN_BLOCK; std::cerr << "notreached\n"; } CATCH_BLOCK_FIN { std::cerr << ".." << __exc; } FIN_BLOCK;
A :: A (1)
A :: A (2)
A :: A (3)
.error
A :: ~ A (3)
A :: ~ A (2)
A :: ~ A (1)
..error
Restrictions
This solution has many disadvantages:
But still.
Despite the described limitations, the described method has inherent advantages:
- Simplicity. A few dozen lines of code - and everything works.
- Transparency concept.
- Easy portability. No dependence on architecture.
Is it possible to eliminate the disadvantages of this method, while retaining its advantages? Yes and no. Using only C ++ tools, it is impossible to do this.
What the author tends.
In the order of technical nonsense, we will think about how to modify the compiler in order to correctly implement the above scheme?
What was missing from the above solution? Knowledge of how the object was spawned.
For example, if an object is built on memory allocated from a common heap and can migrate between threads, it can in no case be registered in a thread-dependent stack. It is not necessary to register an object aggregated into another object anywhere.
And with an object of the same type, but on a stack memory, it is necessary to do this. Of course, it is possible to give a pointer to this stack object to another thread, but it is difficult to imagine in what situation this could be useful.
So:
By the way.
- An object can know that it is a stack. For this, this must be within the stack stack of the current thread.
- So you can take an object off the hook? I can not imagine why this may be necessary, but this possibility exists.
- If you can remove, then you can and plant. Allocate memory in the stack through alloca , forcibly call the constructor and connect it to the stack promotion mechanism.
- For architectures with separate data and control stacks, exception handling can be implemented very effectively using a control stack instead of a list.
PS: Special thanks to Alexander Artyushin for informative discussion.