
Hi Habrapliplus!
I want to make out the problem of the avr-g ++ compiler, because of which, in various discussions about AVR and Arduino, “C ++ is not for microcontrollers, C ++ is eating memory, C ++ is generating bloated code - write on C, but rather on ASM”.
')
To begin with, let's see what is the advantage of C ++ over C. There are many concepts that C ++ adds, but the most significant and the most exploited is OOP support. What is OOP?
- Encapsulation
- Inheritance
- Polymorphism
Using the first two items in C ++ is “free.” A pure C program has no advantage over a C ++ program with encapsulation and inheritance. The picture changes when we connect polymorphism to the action. Polymorphism is different: compile-time, link-time, run-time. I'm talking about the classic run-time, i.e. about virtual functions. As soon as you begin to add virtual methods in your classes, the consumption of both Flash memory and SRAM increases wonderfully.
Why is this happening and what could be done with this, I will tell under the cut.
Example without virtual functions
Let's look at a program with one base class and two heirs:
volatile unsigned char var; class Base { public: void foo() { var += 19; } void bar() { var += 29; } void baz() { var += 39; } }; class DerivedOne : public Base { public: void foo() { var += 17; } void bar() { var += 27; } void baz() { var += 37; } }; class DerivedTwo : public Base { public: void foo() { var += 18; } void bar() { var += 28; } void baz() { var += 38; } }; DerivedOne dOne = DerivedOne(); DerivedTwo dTwo = DerivedTwo(); int main() { Base* b; if (var) b = &dOne; else b = &dTwo; asm("nop"); b->foo(); for (;;) ; return 0; }
In the `main` function, based on the value of` var`, which the compiler is obviously not aware of, we assign a pointer to the base class `b` to either refer to the object of the first inherited class, or link to the object of the second. And then we call the `foo` method on the pointer to the base class.
This example is stupid, because regardless of our messing with the child classes, the implementation of `foo` from the base class` Base` will be called. An example is useful as a starting point.
$ avr-g++ -O0 -c novirtual.cpp -o novirtual.o $ avr-gcc -O0 novirtual.o -o novirtual.elf $ avr-size -C --format=avr novirtual.elf AVR Memory Usage ---------------- Device: Unknown Program: 104 bytes (.text + .data + .bootloader) Data: 3 bytes (.data + .bss + .noinit)
So, the program uses 104 bytes of flash memory and 3 bytes of SRAM. 104 + 3 bytes when using the optimization flags dry up to 34 + 3, and when using the clean code of the dead code, it does 16 + 0 bytes.
If we open the assembler generated by the compiler and find the place of the function call, we will see the picture:
ldd r24,Y+1 ldd r25,Y+2 rcall _ZN4Base3fooEv
The `r24: r25` register is pushed to the value of` this` and an immediate call to `Base :: foo` is made. Simple, effective. Of course, the optimizer will notice the uselessness of this and in general will see the possibility of inline, but let's talk at a non-optimized level.
Add virtual
Now let's add polymorphism. Let's make our methods virtual:
volatile unsigned char var; class Base { public: virtual void foo() { var += 19; } virtual void bar() { var += 29; } virtual void baz() { var += 39; } }; class DerivedOne : public Base { public: virtual void foo() { var += 17; } virtual void bar() { var += 27; }
Checking:
AVR Memory Usage ---------------- Device: Unknown Program: 312 bytes (.text + .data + .bootloader) Data: 25 bytes (.data + .bss + .noinit)
Hoo th! 25 bytes of SRAM are gone. It is easy to check that the creation of the next instance of the class will eat 2 more bytes. These 2 bytes are a pointer to a table of virtual functions, which allows you to execute a specific implementation of a nominal child class when calling a method on a pointer to a base class.
But we have only 2 global objects and one unfortunate variable per 1 byte. Who ate the rest of the memory? Here we come to the essence of the problem. These
are the virtual tables
themselves . A piece for each class. The size of each linearly depends on the number of virtual functions.
Price of polymorphism
Let's sketch out the virtual function tables. In our example there are 3 of them, one for each class:
vtable for Base: foo -> Base::foo bar -> Base::bar baz -> Base::baz vtable for DerivedOne: foo -> DerivedOne::foo bar -> DerivedOne::bar baz -> Base::baz vtable for DerivedTwo: foo -> DerivedTwo::foo bar -> Base::bar baz -> DerivedTwo::baz
Each pointer to an 8-bit AVR is 2 bytes. It is enough to create such tables once for each class in the hierarchy, and then in specific instances add one hidden field `__vtbl *`, which points to a specific table. So each instance will “know who it is,” regardless of what type of pointer its methods call. Those. The single object polymorphism overhead is only +2 bytes per `__vtbl *` and the costs of an indirect call. The method is not called directly, but first its address is pulled from the table, and then the call is made.
ldd r24,Y+1 ldd r25,Y+2 mov r30,r24 mov r31,r25 ld r24,Z ldd r25,Z+1 mov r30,r24 mov r31,r25 ld r18,Z ldd r19,Z+1 ldd r24,Y+1 ldd r25,Y+2 mov r30,r18 mov r31,r19 icall
The extra cost of an indirect call is important if we are talking about numerous calls in the code that is very critical to the execution time. But then the question arises: what does polymorphism do in such code? Each task has its own tool. To solve high-level problems of the PLO - good.
Where avr-gcc is wrong
I showed that the real penalty on SRAM from the active use of virtual functions is 2 bytes per instance. Very adequate for such a rich opportunity. But what does avr-gcc do? He shoves the virtual tables themselves in SRAM! Because of this, the emergence of each new class with virtual functions, its successor, or even the interface (pure abstract class) leads to an increase in consumed SRAM.
This is completely unjustified, since virtual tables cannot be changed during the program execution. They have the most place in Flash-memory, which usually "ends" much later than SRAM. This topic has been
raised 100 times in
different communities .
The irony is that these tables are already placed in Flash, and at the time the controller starts, they are also copied into SRAM. In the generated ASM, to get the address of the function implementation, you need to “just” use not `ldd`, but` lpm`, i.e. go for the address not in a copy of the table in SRAM, but in its original in Flash.
Why has nobody done this optimization yet? Everything as always rests not in technology, but in people. GCC is a really big open source project, which is not worth a big dad with money. GCC is very large, with its culture, structure, suitcase of knowledge, etc. Against the background of his handful of people screaming that they want C ++ on some pieces with some kind of Harvard architecture is very small. There was still no person who belonged to both worlds and was sufficiently motivated for revision.
What to do?
In GCC,
a plugin mechanism appeared a long time ago that allows you to intervene at any point in the chain from AST to assembler. Optimization of virtual tables can be implemented at the plugin level. The only problem is that to create a plug-in you either need to be a GCC insider in order to understand all the specifics, API and entry points, or to be a programmer who smokes manuals and the GCC source code very quickly.
I really hope that such a person exists. I really want this plugin to appear and become available to the community, making our life a little nicer. Amperka is ready to support the development of a ruble ... 150 kilorubbles for the plugin, which would lead to the degradation of the program from the example from 25 bytes of SRAM to 7 bytes.
If you know a person who has already collected a rake in the GCC, please pay attention to this post. Thank you in advance! Write in the comments, in a personal or on victor [dog] amperka.ru.