This article shows some of the mechanisms that allow you to get quite productive (embedded at compile time) and easily scalable code for managing calls to various objects using standard C ++ technologies.
About the task
Some time ago there was a need to implement a small module, which, depending on the user (runtime) information, would perform various actions inside the program kernel. At the same time, the main requirements were maximum performance (optimizability) of the code, the absence of third-party dependencies, and simple scaling in the event of adding functionality.
For greater simplicity and readability, only the most complex key mechanisms will be shown in the code examples. Machine code examples are provided for the Microsoft compiler when optimizing O2.
The first steps
The solution to the problem in the style of "C" would be a simple use of pointers to functions, the values ​​of which are specified during data processing. However, besides the pointers themselves, it was necessary to store some additional information for each function. As a result, to provide the most common solution, the choice focused on an abstract class with the required set of fields and methods.
')
The object approach allows you to abstract from low-level implementations and work with broader concepts, which in my opinion makes it easier to understand the code and device of the program as a whole.
A simple example of this class:
struct MyObj { using FType = int( *)(int, int); virtual int operator() ( int a, int b ) = 0; virtual ~MyObj() = default; };
Here the main is the virtual operator "
() ", the Virtual destructor is needed from
obvious considerations, and FType just defines the semantics of the main method in terms of the types of arguments and return.
Having a similar class, manipulations with function pointers are replaced by working with pointers to the MyObj type. Pointers can be conveniently stored in lists or, say, tables, and all that remains is to initialize correctly. The main difference is that objects can have a state and the inheritance mechanism is applicable for them. This greatly expands and simplifies the possibility of adding various ready-made functionality from external libraries to this code.
With this approach, there is another important plus: the call control itself is transferred to the virtuality mechanism, which provides some protection against errors and problems with optimization at the compiler level.
Embedding
Actually the most important step to the optimal work of the program is the writing of inline code. In essence, this requires that the executed sequence of instructions be minimally dependent on the data runtime. In this case, the compiler will be able to embed the code of functions to the place of their call instead of switching (calling) to an address and / or throwing out unnecessary pieces of code. The same criteria allow us to collect machine code avoiding long jumps and frequent changes to the processor cache, but this is another story.
Unfortunately in our case there is an obvious problem with the choice of actions on user data. This process has been transferred to the virtuality mechanism and the next thing to do is ensure that everything else is built in. To do this, you need to use inheritance and calls for third-party functionality to move inside the overloaded methods. In this case, they can be successfully embedded and optimized.
Inheritance
The first step is to deal directly with the inheritance of an abstract class. The simplest way is the “manual” operator overload during inheritance. For example:
struct : public MyObj { int operator()( int a, int b ) override { return a + b; }; }addObj;
In this case, it turns out that an optimized call to the virtual method will immediately transfer to the folding of two numbers. When optimizing O2, MSVS produces approximately the following machine code for calling * (register preparation, argument packing):
push dword ptr [b] mov eax,dword ptr [esi] mov ecx,esi push dword ptr [a] call dword ptr [eax]
and such code for the actual overloaded method:
push ebp mov ebp,esp mov eax,dword ptr [a] add eax,dword ptr [b] pop ebp ret 8
* The first part is absolutely the same for all cases, since it depends only on the semantics of the call itself, therefore this code will be further ignored. This article always uses the option res = (*po)(a, b);
.In some cases, optimization is even better, for example, g ++ can compress the folding of integers to 2 instructions: lea, ret. In this article, for brevity, I will limit myself to examples obtained on the Microsoft compiler, while noting that the code was also tested for g ++ under linux.
Functors
The logical continuation is the question “what if you need to execute complex code implemented in third-party functions?”. Naturally, this code must be executed inside the overloaded method of the successor MyObj, but if you manually create your own (even anonymous) class for each case, initialize its object and pass on its address, you can not even remember about clarity and scalability.
Fortunately, in C ++ there is an excellent template mechanism for this, which implies the compile-time resolution of the code and, accordingly, embedding. Thus, you can arrange a simple template that will take a functor as a parameter, create an anonymous inheritor class MyObj, and call the resulting parameter inside an overloaded method.
But (of course there is a “but”), what about lambda and other dynamic objects? It is worth noting that
lambda in C ++, because of their implementation and behavior, must be perceived as objects, and not as functions. Unfortunately, lambda expressions in C ++ do not meet the requirements of the template parameter. This problem is torn to fix in the 17th standard, and even without it, not everything is so bad.
A simple and very pleasant solution was peeped
here . In fact, it consists in fair transmission of a dynamic object as a function argument with further dances with a tambourine, but the compiler can easily optimize this code and embed everything that is needed.
As a result, you can write a small pair: a wrapper class and a wrapper function that will give the result we need:
template<class Func> class Wrapping : public MyObj { Func _f; public: Wrapping( Func f ) : _f( f ) {}; int operator()( int a, int b ) override { return _f( a, b ); } }; template<class Func> Wrapping<Func>* Wrap( Func f ) { static Wrapping<Func> W( f ); return &W; }
To initialize the pointer, simply call the function Wrap and pass the desired object as an argument. Moreover, in view of the peculiarities of the concept of a functor (and this is exactly the work with it), an argument can be absolutely any executable object or just a function with an appropriate number of arguments, even if they are of a different type.
An example of a call might be:
po = Wrap( []( int a, int b ) {return a + b; } );
Despite the complicated look, the set of instructions for the overloaded “
() ” operator will be very simple, actually identical to that obtained with manual inheritance and embedding:
push ebp mov ebp,esp mov eax,dword ptr [a] add eax,dword ptr [b] pop ebp ret 8
All complex conditional transitions and initializations occur when Wrap is called, after which only the mechanism for invoking virtual methods remains. In addition to all the work goes with a static object, which means there is hope for the lack of calls to the heap and longjumps.
Interestingly, almost any instances can be embedded. For example, the code:
struct AddStruct { int operator()( int a, int b ) { return a + b; } }; ... op = Wrap( AddStruct() );
Will have the following machine code overloaded operator:
push ebp mov ebp,esp mov eax,dword ptr [a] add eax,dword ptr [b] pop ebp ret 8
Those. same as with manual embedding. I managed to get similar machine code even for an object created through new. But we will leave this example aside.
Functions
The above code has significant problems when it comes to normal functions. This wrapper can easily accept as an argument a pointer to a function of the type:
int sub( int a, int b ) { return a + b; }; ... po = Wrap( sub );
But in the machine code of the overloaded method there will be another call, respectively, with the transition:
push ebp mov ebp,esp push dword ptr [b] mov eax,dword ptr [ecx+4] push dword ptr [a] call eax add esp,8 pop ebp ret 8
This means that in view of certain circumstances (namely, the different nature of functions and objects) in this way, the function cannot be built in.
Functions with identical semantics
Returning to the beginning of the article, remember that for embedding you can pass the desired object (and in this case, a function) through the template parameter. And just once for the function pointer this action is allowed. I use the type defined in our abstract class that defines the semantics of the method being called, you can easily overload the wrapper wrapper pair specifically for such functions:
template<class Func, Func f> struct FWrapping : public MyObj { int operator ()( int a, int b ) override { return f( a, b ); } }; template<MyObj::FType f> FWrapping<MyObj::FType, f>* Wrap() { static FWrapping<MyObj::FType, f> W; return &W; }
Wrapping Overloaded Wrap for View Functions:
int add( int a, int b ) { return a + b; } ... po = Wrap<add>();
You can get the optimal machine code, identical to that obtained by manual inheritance:
push ebp mov ebp,esp mov eax,dword ptr [a] add eax,dword ptr [b] pop ebp ret 8
Functions with excellent semantics
The last question is the situation when the function necessary for embedding does not coincide in type with that declared in MyObj. For this case, you can easily add another overload of the wrapper function, in which the type will be passed as another template parameter:
template<class Func, Func f> FWrapping<Func, f>* Wrap() { static FWrapping<Func, f> W; return &W; }
Calling this function requires manual indication of the type of the function being transferred, which is not always convenient. To simplify the code, you can use the
decltype( )
keyword:
po = Wrap<decltype( add )*, add>();
It is important to note the need to put "
* " after
decltype
, otherwise the development environment may give an error message about the lack of a Wrap implementation that satisfies these arguments. Despite this, the project is likely to compile normally. This discrepancy is caused by the rules for determining types when passing to a template and, in fact, by the principle of operation of
decltype
. To avoid an error message, you can use a construct such as
std::decay
to ensure correct type substitution, which is conveniently wrapped in a simple macro:
#define declarate( X ) std::decay< decltype( X ) >::type ... po = Wrap<declarate( add ), add>();
Or just to track compliance manually, if you do not want to produce entities.
Of course, the machine code when embedding such a function will be different, since at least a type conversion is required. For example, when calling a function defined as:
float fadd( float a, float b ) { return a + b; } ... op = Wrap<declarate(fadd), fadd>();
From the disassembler will be about this:
push ebp mov ebp,esp movd xmm1,dword ptr [a] movd xmm0,dword ptr [b] cvtdq2ps xmm1,xmm1 cvtdq2ps xmm0,xmm0 addss xmm1,xmm0 cvttss2si eax,xmm1 pop ebp ret 8
Functions together
Having received additional overloads of the Wrap function for embedding the other functions proper, in order to avoid code repetition and to get closer to Zen, you can redefine one of the options by calling another:
template<class Func, Func f> FWrapping<Func, f>* Wrap() { static FWrapping<Func, f> W; return &W; } template<MyObj::FType f> FWrapping<MyObj::FType, f>* Wrap() { return Wrap<MyObj::FType, f>(); }
It should be noted that all three overloads of the function Wrap can exist simultaneously, since the parameters of the templates obey the same rules regarding polymorphism as the arguments of the functions.
Together
As a result of the above, for <50 lines we got a mechanism that allows you to automatically convert any executable objects and functions with fairly
close * semantics to a unified type with the possible addition of necessary properties and maximum embedding of the executable code.
* close enough for this example means a coincidence in the number of arguments and subject to the coincidence or possibility of implicit type conversion. struct MyObj { using FType = int( *)(int, int); virtual int operator() ( int a, int b ) = 0; virtual ~MyObj() = default; }; template<class Func> class Wrapping : public MyObj { Func _f; public: Wrapping( Func f ) : _f( f ) {}; int operator()( int a, int b ) override { return _f( a, b ); } }; template<class Func, Func f> struct FWrapping : public MyObj { int operator ()( int a, int b ) override { return f( a, b ); } }; template<class Func> Wrapping<Func>* Wrap( Func f ) { static Wrapping<Func> W( f ); return &W; } template<class Func, Func f> FWrapping<Func, f>* Wrap() { static FWrapping<Func, f> W; return &W; } template<MyObj::FType f> FWrapping<MyObj::FType, f>* Wrap() { return Wrap<MyObj::FType, f>(); } #define declarate( X ) std::decay< decltype( X ) >::type
A potential problem for this mechanism is the need to "wrap" functions with an excellent number of arguments or types not implicitly (implicitly). A certain solution is to call such functions (functors) inside a wrapped lambda. For example:
int volume( const double& a, const double& b, const double& c ) { return a*b*c; }; ... po = Wrap( []( int a, int b )->int { return volume( a, b, 10 ); } );
Code examples are
here . To build you need to use C ++ 11. In order to see the difference in embedding - O2 optimization. The code is prepared to avoid excessive embedding.
______________________________
Addition
In the comments there were several important questions that I will try to answer.
1) Differences from std::function
:
First of all, I want to note that the essence of the task was to transfer the call control to the virtuality mechanism (the reason for this need is beyond the scope of the article). Therefore, all variants of the
Wrap
function should be perceived as a
way to generate a class of successor . Therefore, each call must be made “manually” - the use of functions inside the cycle will lead to incorrect program behavior. This is a disadvantage compared to the same
std::function
.
The second important point is that this implementation does NOT work with dynamic memory. This gives a potential advantage in terms of performance. Moreover, if in practice there is an urgent need to use dynamic memory all the same, it will be necessary to change just a couple of lines inside each function with the addition of the
new
operator. However, in this case, be sure to need to control the cleaning of memory (which is now happening automatically).
With this approach, it is still possible to embed the executable code (similar to what happens in
std::function
, and at the same time it remains a working mechanism for virtual calls.
2) Wrap
incorrectly when calling multiple times for objects of the same type with different states
Thanks to attentive people - I really missed the possibility of incorrect (not obvious) work of the code in the case when, when processing functors of the same type, different instances with different states are transferred. In this case, the static object of the
Wrapping
class will be initialized only 1 time with the very first argument. All other calls will have no effect.
The first thing to do is add protection against such situations. To do this, you can simply add a flag and generate an exception when trying to reinitialize, for example like this:
template<class Func> Wrapping<Func>* Wrap( Func f ) { static int recallFlag{}; if( recallFlag ) throw "Second Wrap of the same type!\n"; recallFlag++; static Wrapping<Func> W( f ); return &W; }
(I apologize for the nonstandard object for throw)
However, this is not a solution to the problem, but only an alarm in case of accidents.
A simple and effective solution is to add a parameter (with a default value) to the template of the Wrap function. If necessary, this parameter can be changed, and then another implementation of the function will be called, respectively, with another static Wrapping instance:
template<int i = 0, class Func> Wrapping<Func>* Wrap( Func f ) {...}
After that, each time you call for the arguments of one type, you will need to pass a new value as a parameter. Manually doing this is somewhat inconvenient. There are several solutions:
- add a small macro using the predefined macro __COUNTER__ or __LINE__.
- to collect on the basis of the above-mentioned macros a certain template counter.
- go on an
esoteric exotic path and collect a purely template counter.
The first solution is very reliable and simple. However, please note that when using Wrap in different files, the __LINE__ macro can give the same result, and the __COUNTER__ macro is not standard, although it is implemented on most compilers. Also, conflicts can arise if other modules of the program use this macro in some way and require sole rights to it. In general, the solution looks like this:
#define OWrap( ... ) \ Wrap<__COUNTER__>( __VA_ARGS__ )
In addition, you can define macros for a simple call to Wrap under function arguments:
#define FWrap( ... ) \ Wrap<declarate( __VA_ARGS__ ), __VA_ARGS__>()
The second option can be implemented using for example solutions
from here and
from here . Then you can just as well substitute the result inside the macro.
Last , but in my opinion - the most interesting version is inspired by
this article. In fact, this is a rather thin implementation, although it is completely within the framework of the C ++ 11 standard. The result makes it possible to directly substitute the counter into the template without using additional macros, for example like this:
template<int i = next(), class Func> Wrapping<Func>* Wrap( Func f ) {...}
where
next()
is the implementation of the template counter.
It is worth noting that you need to think three times and ask all the available employees before throwing such code into production, although the result is extremely interesting and useful. A detailed description and implementation of this mechanism I will post the following addition or a separate article.