📜 ⬆️ ⬇️

Table of virtual methods and safety

Safety engineering As a small warm-up before the article, I would like the reader to ask himself the following question: do the photographer need to know how the camera works in order to get high-quality pictures? Well, at least, should he know the concept of "diaphragm"? "Signal-to-noise ratio"? "Depth of field"? Practice suggests that even with the knowledge of such complex words, pictures can be obtained from the most “handy” ones, which are not particularly better than those taken on a mobile phone through a 0.3-megapixel-hollow. And on the contrary, really good pictures can be obtained solely due to experience and intuition with complete ignorance of the materiel (although these are rather exceptions to the rules, but still). However, it is unlikely that someone will argue with me that professionals who want to squeeze everything out of their technology (and not just the number of megapixels per square millimeter of the matrix) need this knowledge without fail, since otherwise they will have to be called a professional. can not. And this is true not only for the digital photography industry, but for almost any other.

This is also true for programming, and doubly for programming in C ++. This article will describe the important concept of the language, known as the “Virtual Table Index”, which is present in almost all complex classes, and how it can be accidentally damaged. This can, in turn, lead to hardly debugging errors. First, let me remind you what it is all about, and then I will share my thoughts on how and what could break there.

To our great regret, in this article there will be a lot of reasoning related to the low level. But no problem anymore, alas, not to illustrate. At the same time, I’ll make a reservation that the article is written mostly for the Visual C ++ Compiler compiler in the build mode of a 64-bit program - the results of the program’s work in other compilers and for another architecture may differ.

Virtual Table Index


The theory says that the vptr pointer — a pointer to a virtual method table, or a virtual table pointer — is present in every class that has at least one virtual method. Let us understand in more detail what kind of animal it is. To do this, we write a simple demo program in C ++.
#include <iostream> #include <iomanip> using namespace std; int nop() { static int nop_x; return ++nop_x; //   , ! }; class A { public: unsigned long long content_A; A(void) : content_A(0xAAAAAAAAAAAAAAAAull) { cout << "++ A has been constructed" << endl;}; ~A(void) { cout << "-- A has been destructed" << endl;}; void function(void) { nop(); }; }; void PrintMemory(const unsigned char memory[], const char label[] = "contents") { cout << "Memory " << label << ": " << endl; for (size_t i = 0; i < 4; i++) { for (size_t j = 0; j < 8; j++) cout << setw(2) << setfill('0') << uppercase << hex << static_cast<int> (memory[i * 8 + j]) << " "; cout << endl; } } int main() { unsigned char memory[32]; memset(memory, 0x11, 32 * sizeof(unsigned char)); PrintMemory(memory, "before placement new"); new (memory) A; PrintMemory(memory, "after placement new"); reinterpret_cast<A *>(memory)->~A(); system("pause"); return 0; }; 

Despite the relatively large amount of code, the logic of its work should be fairly obvious: 32 bytes are allocated on the stack, which are filled with the values ​​0x11 (we assume that this is sort of garbage in memory). Then, a rather trivial class A object is created over these 32 bytes using the placement new operator. Finally, the contents of the memory are printed, after which the program destroys the object and completes its execution. Below is the output of this program (Microsoft Visual Studio 2012, x64).
 Memory before placement new: 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 ++ A has been constructed Memory after placement new: AA AA AA AA AA AA AA AA 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 -- A has been destructed Press any key to continue . . . 

It is easy to see that the class size in memory is 8 bytes and is equal to the size of its only member unsigned long long content_A.
')
Let's make the program a bit more complicated by adding the virtual keyword to the void function (void):
 virtual void function(void) {nop();}; 

The output of the program (hereinafter, only part of the output will be shown with the exception of Memory before placement new and Press any key ...):
 ++ A has been constructed Memory after placement new: F8 D1 C4 3F 01 00 00 00 AA AA AA AA AA AA AA AA 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 -- A has been destructed 

Again, it is easy to see that the class size in memory is now 16 bytes. The first eight bytes now occupy a pointer to a table of virtual methods. The pointer at the start of the program turned out to be 0x000000013FC4D1F8 (the pointer and content_A are “deployed” in memory, since Intel64 uses little-endian byte order ; however, in the case of content_A, you can’t tell about it right away).

The virtual method table is a special in-memory structure generated automatically that lists pointers to virtual methods. If the function () method is called somewhere in the code with reference to the pointer to class A, instead of calling the function A :: function directly (), the function in the virtual methods table will be called at the desired offset — this behavior implements polymorphism. By itself, the table of virtual functions is presented below (obtained by compiling with the / FAs key; in addition, pay attention to the somewhat strange name of the function in the assembler code - it went through the name-mangling ):
 CONST SEGMENT ??_7A@@6B@ DQ FLAT:??_R4A@@6B@ ; A::'vftable' DQ FLAT:?function@A@@UEAAXXZ CONST ENDS 


__declspec (novtable)


Sometimes there are situations when the table of virtual classes, in principle, is not needed. Suppose that we will never instantiate class A, and if we do, it is only on weekends and on holidays, but at the same time carefully making sure that no virtual function is called. This is quite a common situation in cases of abstract classes - it is known that if a class is abstract, then it cannot be instantiated. Not at all. Indeed, if function (void) were declared in class A as abstract, then the virtual method table would look like this:
 CONST SEGMENT ??_7A@@6B@ DQ FLAT:??_R4A@@6B@ ; A::'vftable' DQ FLAT:_purecall CONST ENDS 

Obviously, an attempt to call such a function will lead to a backache of his own leg.

The question is: if a class is never instantiated, then why install a virtual table pointer? In order for the compiler to not generate extra code, you can give it instructions in the form of __ declspec (notable) (caution: Microsoft-specific!). Rewrite our example class with a virtual function using the __declspec (novtable) attribute:
 class __declspec(novtable) A { .... } 

The output of the program will be as follows:
 ++ A has been constructed Memory after placement new: 11 11 11 11 11 11 11 11 AA AA AA AA AA AA AA AA 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 11 -- A has been destructed 

First of all, let's pay attention to the fact that the size of the object has not changed: it still takes 16 bytes. In total, after introducing the __declspec (novtable) attribute, there were only two differences: firstly, now at the place where the address of the virtual method table was located, there is an uninitialized memory area; secondly, in the assembler code, the virtual methods table of class A is no longer at all. But the virtual table pointer is still there and still “weighs” eight bytes! This needs to be remembered because ...

Inheritance


Let's rewrite our example in such a way as to implement the simplest inheritance from an abstract class with a virtual table pointer.
 class __declspec(novtable) A //     { public: unsigned long long content_A; A(void) : content_A(0xAAAAAAAAAAAAAAAAull) { cout << "++ A has been constructed" << endl;}; ~A(void) { cout << "-- A has been destructed" << endl;}; virtual void function(void) = 0; }; class B : public A //     A { public: unsigned long long content_B; B(void) : content_B(0xBBBBBBBBBBBBBBBBull) { cout << "++ B has been constructed" << endl;}; ~B(void) { cout << "-- B has been destructed" << endl;}; virtual void function(void) { nop(); }; }; 

Also we will make so that instead of class A in the main program a class B is created (and destroyed):
 .... new (memory) B; PrintMemory(memory, "after placement new"); reinterpret_cast<B *>(memory)->~B(); .... 

The output of the program will be as follows:
 ++ A has been constructed ++ B has been constructed Memory after placement new: D8 CA 2C 3F 01 00 00 00 AA AA AA AA AA AA AA AA BB BB BB BB BB BB BB BB 11 11 11 11 11 11 11 11 -- B has been destructed -- A has been destructed 

Let's try to figure out what happened. The constructor B :: B () was called. This constructor, before being executed, calls the constructor of the base class, the constructor A :: A (). First of all, it should have initialized the virtual table pointer, however, due to the attribute __ declspec (novtable) it was not initialized. Then the constructor sets the value of the content_A field to 0xAAAAAAAAAAAAAAAAull (the second field in memory) and returns control to the constructor B :: B ().

Since object B does not have the attribute __declspec (novtable), the constructor sets the virtual table pointer (first field in memory) to the class B virtual table, then sets content_B to 0xBBBBBBBBBBBBBBBBull (third field in memory) and returns control to the main program. From the contents of the memory, it is easy to understand that the object of class B was constructed correctly, and from the logic it is clear that an unnecessary operation in this context was skipped. If confused: under an unnecessary operation is the initialization of a pointer to a virtual table in the constructor of the base class.

It would seem that only one operation was missed - meaning to get rid of it? But if the program has thousands and thousands of classes inherited from the same abstract class, getting rid of one auto-generated command can seriously affect performance. And will affect. Do not believe?

Memset function


The main idea of ​​the memset () function is to fill the memory area with a certain constant value (most often zero). In C, it could be used to quickly initialize all the fields in a structure. What is the difference between C ++ class and C structure in memory if there is no virtual table pointer in it? In principle, nothing, data - they are data. To initialize really simple classes (in C ++ 11 terminology - types with a standard device ) it is quite possible to use the memset () function. But, in theory, the memset () function can be used to initialize all classes in general, but what are the consequences? An incorrect memset () can, in one fell swoop, make a virtual table pointer unusable. But the question immediately arises: is it possible, all the same, if the class is declared as __ declspec (notable)?

Answer: it is possible, but only carefully.

Rewrite the classes as follows: add the wipe method, which will set all the contents of class A to 0xAA:
 class __declspec(novtable) A //     { public: unsigned long long content_A; A(void) { cout << "++ A has been constructed" << endl; wipe(); }; // { cout << "++ A has been constructed" << endl; }; ~A(void) { cout << "-- A has been destructed" << endl;}; virtual void function(void) = 0; void wipe(void) { memset(this, 0xAA, sizeof(*this)); cout << "++ A has been wiped" << endl; }; }; class B : public A //     A { public: unsigned long long content_B; B(void) : content_B(0xBBBBBBBBBBBBBBBBull) { cout << "++ B has been constructed" << endl;}; // { // cout << "++ B has been constructed" << endl; // A::wipe(); // }; ~B(void) { cout << "-- B has been destructed" << endl;}; virtual void function(void) {nop();}; }; 

The output of the program in this case will turn out to be quite expected:
 ++ A has been constructed ++ A has been wiped ++ B has been constructed Memory after placement new: E8 CA E8 3F 01 00 00 00 AA AA AA AA AA AA AA AA BB BB BB BB BB BB BB BB 11 11 11 11 11 11 11 11 -- B has been destructed -- A has been destructed 

So far everything is working well.

However, it is worth slightly changing the location of the call to the wipe () function, commenting out the constructor lines and uncommenting the ones following them, and it will immediately become clear that something has gone wrong. The first call to the virtual function function () will result in a run-time error due to a damaged virtual table pointer:
 ++ A has been constructed ++ B has been constructed ++ A has been wiped Memory after placement new: AA AA AA AA AA AA AA AA AA AA AA AA AA AA AA AA BB BB BB BB BB BB BB BB 11 11 11 11 11 11 11 11 -- B has been destructed -- A has been destructed 

Why did it happen? The wipe () function was called after the class B constructor initialized the pointer to the virtual method table. As a result, this pointer has gone bad. In other words, you should not reset a class with a virtual table pointer, even if it is declared with __ declspec (novtable). Full zeroing will be appropriate only in the constructor of the class that will never be instantiated, and even then this should be done with great care.

Memcpy function


With the memcpy () function, the picture is exactly the same. Again, in theory, it can be used to copy types with a standard device in memory. However, judging by practice, some programmers like to use it where necessary and where not. In the case of types that do not have a standard device in memory, using the memcpy () function is like walking a rope over Niagara Falls: one mistake can lead to fatal consequences, and making it ridiculously simple. As an example:
 class __declspec(novtable) A { .... A(const A &source) { memcpy(this, &source, sizeof(*this)); } virtual void foo() { } .... }; class B : public A { .... }; 

The copy constructor can write whatever his digital heart desires to a pointer to a virtual table of an abstract class: the correct value will be put there anyway in the heir classes. But in the implementation of the assignment operator, the memcpy () function can no longer be used:
 class __declspec(novtable) A { .... A &operator =(const A &source) { memcpy(this, &source, sizeof(*this)); return *this; } virtual void foo() { } .... }; class B : public A { .... }; 

Now, remember how much we are used to, that the assignment operator and the copy constructor are in fact the same thing. No, not everything is so bad: in practice, the assignment operator code can even work properly, but not at all because it is correct, but because the stars are like that. The code copies the pointer to the table of virtual methods from another object, and it is not known what this will result in.

PVS-Studio


This article appeared as a result of detailed research regarding the mysterious __ declspec (novtable), as well as when it is possible, and when it is impossible to use the functions memset () and memcpy () in high-level code. From time to time, developers write to us that the PVS-Studio analyzer too often gives warnings regarding the virtual table pointer. Programmers believe that if there is __declspec (novtable), then there is neither a virtual method table nor a virtual table pointer. We began to carefully deal with this issue and realized that not everything is so simple.

It must be remembered. If __declspec (novtable) is used when declaring a class, this does not mean that the class does not contain a pointer to a table of virtual methods! But this pointer is initialized or not - this is a completely different question.

We will make the analyzer not swear at the memset () / memcpy () function, but only if they are used in the constructors of the base class declared with __ declspec (novtable).

Conclusion


Unfortunately, the article did not manage to cover a lot of material related to inheritance (for example, the topic of multiple inheritance remained completely uncovered). However, I hope that this information will make it possible to understand that “everything is not so simple there” and that it is worth thinking three times before using low-level functions in relation to high-level objects. And in general, is it worth it?

Source: https://habr.com/ru/post/239915/


All Articles