📜 ⬆️ ⬇️

Urban legends about slow calls to virtual functions

Traditionally, compilers implement virtual function calls via dual indirect addressing - if a class contains at least one virtual function, then the address of the virtual function table is stored at the beginning of each object of this class. If the compiler does not know the specific type of object pointed to by the pointer, then to call a virtual function, you must first take a pointer to the object, read the address of the beginning of the table, then read the address where the function implementation is stored, then call the function.

The process of searching for a specific function by a pointer to an object is called late binding and is performed while the program is running. Late binding not only increases the call overhead, but also prevents the compiler from optimizing the code. Because of this, the virtual functions themselves are considered to be slowing down the work.

In the text above, the “if” keyword. What if the compiler knows which function to actually call?

In the Standard (hereinafter referred to as C ++ 03 Standard Reference), nothing is said about virtual method tables. Instead, 5.2.2 / 1 ([expr.call], “function call”) states that if a program contains a virtual function call, the corresponding function must be called, selected according to the rules of 10.3 / 2 ([class.virtual ], “Virtual functions”), and it says that TL; DR; A function must be selected from the derived class itself, in which the function is defined or redefined.
')
Accordingly, if the compiler can, having parsed the code, find out the exact type of the object, it does not have to use late binding - and it does not matter if the method is called from a specific object, by reference or by pointer to the object.

From meaningless reasoning, let's move on to the code that we will try on gcc.godbolt.org

We need these two classes:

class Base { public: virtual ~Base() {} virtual int Magic() { return 9000; } }; class Derived : public Base { public: virtual int Magic() { return 100500; } }; 


To begin with such code:

 int main() { Derived derived; return derived.Magic(); } 

clang 3.4.1 with -O2 responds to this like this:
 main: # @main movl $100500, %eax # imm = 0x18894 ret 

It is easy to see that the machine code corresponds to a program containing only return 100500; This is not particularly interesting - because there are no pointers and links.

Okay, slowly stirring, add pointers and links:

 int magic( Base& object ) { return object.Magic(); } int main() { Base* base = new Derived(); int result = magic( *base ); delete base; return result; } 

clang 3.4.1 with -O2 responds to this like this:
 magic(Base&): # @magic(Base&) movq (%rdi), %rax jmpq *(%rax) # TAILCALL main: # @main movl $100500, %eax # imm = 0x18894 ret 


OX YEARS COMPILER ERROR No, the compiler is fine, but the aggressiveness of optimization cannot be denied. Again return 100500;

For comparison, gcc 4.9.0 with -O2:

 main: subq $8, %rsp movl $8, %edi call operator new(unsigned long) movq vtable for Derived+16, (%rax) movq %rax, %rdi call Derived::~Derived() movl $100500, %eax addq $8, %rsp ret 


call Derived :: ~ Derived () - because of the virtual destructor, gcc in such cases puts the call :: operator delete () inside the destructor:

 Derived::~Derived(): jmp operator delete(void*) 

although it could and place in place. Like this:
  movq %rax, %rdi call operator delete(void*) 

Could, but did not. At the same time, the body of the Derived :: Magic () method is substituted into the place of the call and optimized along with the surrounding code.

A small digression ... If you like to talk at length about how well the compiler can in principle optimize the code, the example above is for you. Both the call to Derived :: Magic () , and the compiler could optimize the removal of the object with equal success, but one of them was optimized, and the second was not. Retreat is over.

For comparison, gcc 4.9.0 with -O1

 magic(Base&): subq $8, %rsp movq (%rdi), %rax call *(%rax) addq $8, %rsp ret main: pushq %rbp pushq %rbx subq $8, %rsp movl $8, %edi call operator new(unsigned long) movq %rax, %rbx movq vtable for Derived+16, (%rax) movq %rax, %rdi call magic(Base&) movl %eax, %ebp testq %rbx, %rbx je .L12 movq (%rbx), %rax movq %rbx, %rdi call *16(%rax) .L12: movl %ebp, %eax addq $8, %rsp popq %rbx popq %rbp ret 


Here, maybe because, if you ask well. In this code, “everything is in order” is a bunch of memory accesses and method invocation by a call instruction with indirect addressing ( call * 16 (% rax) ).

However, success examples with -O2 look far-fetched - all the code is in one translation unit, and this greatly simplifies optimization.

LTO is in a hurry to help (or whatever it is called optimization of several translation units in your compiler).

We divide the code into several translation units ...

 //Classes.h class Base { public: virtual int Magic(); virtual ~Base(); }; class Derived : public Base { public: virtual int Magic(); }; //Classes.cpp #include <Classes.h> #include <stdio.h> Base::~Base() { } int Base::Magic() { return 9000; } int Derived::Magic() { return 100500; } //main.cpp #include <Classes.h> int magic( Base& object ) { return object.Magic(); } int main() { Base* base = new Derived(); int result = magic( *base ); delete base; return result; } 


Hereinafter we will use MinGW with gcc 4.9.0

 g++ -flto -g -O3 main.cpp Classes.cpp objdump -d -M intel -S --no-show-raw-insn a.exe >a.txt 


 int main() { 402830: push ebp 402831: mov ebp,esp 402833: and esp,0xfffffff0 402836: sub esp,0x10 402839: call 402050 <___main> Base* base = new Derived(); 40283e: mov DWORD PTR [esp],0x4  ::operator new() 402845: call 4015d8 <__Znwj>    vtable 40284a: mov DWORD PTR [eax],0x404058 int result = magic( *base ); delete base; 402850: mov ecx,eax 402852: call 4015c0 <__ZN7DerivedD0Ev> return result; }       402857: mov eax,0x18894 40285c: leave 40285d: ret 

Here we are interested in the instruction mov eax, 0x18894 (100500 in hexadecimal notation) - again the compiler chose the necessary function, substituted its body into the place of the call and optimized the surrounding code.

Too easy, so we add a factory (Derived and Base are the same) ...
 //Factory.h #include <Classes.h> class Factory { public: static Base* CreateInstance(); }; //Factory.cpp #include <Factory.h> Base* Factory::CreateInstance() { return new Derived(); } //main.cpp #include <Factory.h> int magic( Base& object ) { return object.Magic(); } int main() { Base* base = Factory::CreateInstance(); int result = magic( *base ); delete base; return result; } 

Compile, disassemble ... Initially, the result does not look very clear - due to aggressive optimization, the machine code and source code were not matched in the most convenient way to read, the machine code below is left as is, and some source code lines are placed as close as possible to the corresponding machine code.
 int main() { 402830: push ebp 402831: mov ebp,esp 402833: push esi 402834: push ebx 402835: and esp,0xfffffff0 402838: sub esp,0x10 40283b: call 402050 <___main> return new Derived(); 402840: mov DWORD PTR [esp],0x4  ::operator new() 402847: call 4015d8 <__Znwj> 40284c: mov ebx,eax int magic( Base& object ) { return object.Magic(); 40284e: mov ecx,eax    vtable 402850: mov DWORD PTR [eax],0x404058   Derived::Magic() 402856: call 401580 <__ZN7Derived5MagicEv> int main() { delete base; 40285b: mov ecx,ebx 40285d: mov esi,eax 40285f: call 4015b0 <__ZN7DerivedD0Ev> return result; 402864: lea esp,[ebp-0x8] 402867: mov eax,esi 402869: pop ebx 40286a: pop esi 40286b: pop ebp 40286c: ret ( ) 


Here we are interested in the string
  402856: call 401580 <__ZN7Derived5MagicEv> 


This is a direct call to Derived :: Magic () :
  00401580 <__ZN7Derived5MagicEv>: int Derived::Magic() { return 100500; } 401580: mov eax,0x18894 401585: ret 


The compiler correctly determined which function to call, but did not substitute the body of the function in the place of the call.

Parameterize the factory (Base and Derived are the same) ...
 //Factory.h #include <Classes.h> enum ClassType { BaseType, DerivedType }; class Factory { public: static Base* CreateInstance(ClassType classType); }; //Factory.cpp #include <Factory.h> Base* Factory::CreateInstance(ClassType classType) { switch( classType ) { case BaseType: return new Base(); case DerivedType: return new Derived(); } } //main.cpp #include <Factory.h> int magic( Base& object ) { return object.Magic(); } int main() { Base* base = Factory::CreateInstance(DerivedType); int result = magic( *base ); delete base; return result; } 

We get ... the same code as in the previous attempt.

Now party hard will calculate the factory parameter when the program runs ...
 #include <Factory.h> #include <cstdlib> int magic( Base& object ) { return object.Magic(); } int main() { Base* base = Factory::CreateInstance(rand() ? BaseType : DerivedType); int result = magic( *base ); delete base; return result; } 

We get ... (the result again does not look very clear)
 int main() { 402830: push ebp 402831: mov ebp,esp 402833: push esi 402834: push ebx 402835: and esp,0xfffffff0 402838: sub esp,0x10 40283b: call 402050 <___main> Base* base = Factory::CreateInstance(rand() ? BaseType : DerivedType);  rand() 402840: call 4027c8 <_rand> Base* Factory::CreateInstance(ClassType classType) { switch( classType ) {        switch 402845: test eax,eax 402847: mov DWORD PTR [esp],0x4  40284e: jne 402875 <_main+0x45>  rand()   ,      402875  rand()  ,    ... case DerivedType: return new Derived();  ::operator new() 402850: call 4015d8 <__Znwj>    vtable  Derived 402855: mov DWORD PTR [eax],0x404070 40285b: mov ebx,eax int magic( Base& object ) { return object.Magic();      -   "" ,     ,    402875 (rand() != 0) 40285d: mov eax,DWORD PTR [ebx] 40285f: mov ecx,ebx   Magic() 402861: call DWORD PTR [eax] 402863: mov esi,eax int main() { delete base; 402865: mov eax,DWORD PTR [ebx] 402867: mov ecx,ebx     402869: call DWORD PTR [eax+0x8] return result; } 40286c: lea esp,[ebp-0x8] 40286f: mov eax,esi 402871: pop ebx 402872: pop esi 402873: pop ebp 402874: ret Base* Factory::CreateInstance(ClassType classType) { switch( classType ) { case BaseType: return new Base();        40284e  rand() != 0  ::operator new() 402875: call 4015d8 <__Znwj>    vtable  Base 40287a: mov DWORD PTR [eax],0x404058 402880: mov ebx,eax         402882: jmp 40285d <_main+0x2d> 


Quite a curious result. The factory method code is completely in place. Depending on the result of the rand () function call, right inside main () , branching and creation of instances of the corresponding classes are performed. The compiler could have placed further direct calls in each branch, but did not cope with the optimization and slipped into two indirect calls — one to call the Magic () method with late binding, and the second to remove the object, also with late binding.

As you can see, calls to virtual functions do not require the use of late binding, and what happens in the real world depends on the compiler, its settings, and the specific code.

Dmitry Mescheryakov,
product department for developers

Source: https://habr.com/ru/post/248429/


All Articles