Simplest memory profiling on STM32 and other microcontrollers

“With experience comes the standard, scientific approach to calculating the correct stack size: take a random number and hope for the best”
- Jack Ganssle, "The Art of Designing Embedded Systems"

Hi, Habr!

Strangely enough, but in the absolute majority of “textbooks for beginners” I have seen on STM32 in particular, and microcontrollers do not exist at all, as a rule, there is nothing at all about such a thing as memory allocation, stack placement and, most importantly, preventing memory overflow. one area grinds another and everything collapses, usually with enchanting effects.
')
This is partly due to the simplicity of training projects that are performed at the same time on debug boards with relatively fat microcontrollers, which fly into memory shortages, blinking with LED, is quite difficult - however recently even beginners have found mentioning, for example, of controllers like STM32F030F4P6 , simple to install, costing a penny, but also memory having units of kilobytes.

Such controllers allow you to do quite serious things for yourself (well, for example, we have such a completely suitable measurement made on STM32F042K6T6 with 6 KB of RAM, from which a little more than 100 bytes remain free), but when dealing with memory, you need a certain accuracy.

About this accuracy and I want to talk. The article will be short, professionals will not learn anything new - but for beginners it is highly recommended to have this knowledge.

In a typical project on a Cortex-M core microcontroller, the RAM has a conditional division into four sections:

data - data initialized by a specific value
bss - data initialized by zero
heap - heap (dynamic area from which memory is explicitly allocated using malloc)
stack - the dynamic region from which memory is allocated by the compiler implicitly.

Occasionally, the noinit region may also occur (uninitialized variables — they are convenient because they preserve the value between reboots), and even less often, some other areas allocated for specific tasks.

They are located in physical memory in a rather specific way - the fact is that the stack in microcontrollers on ARM cores grows from top to bottom. Therefore, it is located separately from the remaining memory blocks, at the end of the RAM:

By default, its address is usually equal to the most recent address of RAM, and from there it grows down as it grows - and one extremely unpleasant feature of the stack grows from it: it can reach bss and rewrite its top, and you will not find out about it in any obvious way.

Static and dynamic memory areas

All memory is divided into two categories - statically allocated, i.e. memory, the total amount of which is obvious from the text of the program and does not depend on the order of its execution, and dynamically allocated, the required amount of which depends on the progress of the program.

The latter includes a heap (from which we take chunks with malloc and return with free) and a stack that grows and decreases “by itself”.

Generally speaking, using malloc on microcontrollers is strongly discouraged unless you know exactly what you are doing. The main problem introduced by them is memory fragmentation - if you allocate 10 pieces of 10 bytes, and then free every second, then you will not get 50 free bytes. You will receive 5 free pieces of 10 bytes each.

In addition, at the compilation of the program, the compiler will not be able to automatically determine how much memory your malloc will require (especially taking into account the fragmentation, which depends not only on the size of the requested pieces, but on the sequence of their allocation and release), and therefore cannot warn you if there is not enough memory in the end.

There are some ways to circumvent this problem - special malloc implementations that work within a statically allocated area, rather than the entire RAM, use malloc carefully, taking into account possible fragmentation at the program logic level, etc. - but in general, it is better not to touch malloc .

All memory areas with boundaries and addresses are written in a file with the extension LD, which the linker is oriented to when building the project.

Statically allocated memory

So, from the statically allocated memory, we have two areas - bss and data, which differ only formally. During system initialization, the data block is copied from the flash, where the necessary initialization values are saved for it, the bss block is simply filled with zeros (at least, filling it with zeros is considered a good tone).

Both things - copying from flash and filling with zeros - are done in the program code in an explicit form , but not in your main (), but in a separate file that is executed first, is written once and just dragged from the project to the project.

However, this is not what interests us now - but how we will understand if our data fits at all into the RAM of our controller.

This is very easy to recognize - with the arm-none-eabi-size utility with a single parameter — the compiled ELF file of our program (often calling it is inserted into the end of the Makefile, because it is convenient):

Here, text is the amount of program data lying in the flash, and bss and data are our statically allocated areas in RAM. We don’t care about the last two columns - it’s the sum of the first three, it has no practical meaning.

Total, statically in RAM, we need bss + data bytes, in this case - 5324 bytes. The controller has 6144 bytes of RAM, we do not use malloc, 820 bytes remain.

Which should be enough for us to stack.

But is it enough? Because if not, our stack will grow to our own data, and then first it will wipe the data, then the data will wipe it, and then everything will collapse. Moreover, between the first and second points the program can continue to work without realizing that there is garbage in the data it processes. In the worst case, it will be the data that you wrote down when everything was fine with the stack, and now you are only reading - for example, the calibration parameters of some sensor - and then you don’t have an obvious way to understand that everything is bad with them, this program will continue to run, as if nothing had happened, giving you garbage at the exit.

Dynamically allocated memory

And here the most interesting begins - if you shorten a fairy tale to one phrase, then it is almost impossible to determine the size of the stack in advance .

Purely theoretically , you can ask the compiler to give you the stack size used by each individual function, then ask it to issue the execution tree of your program, and for each branch in it to calculate the sum of the stacks of all the functions in this tree. This alone for any more or less complex program will take you a very considerable time.

Then you will remember that at any moment an interrupt may occur, the handler of which also needs memory.

Then - what can happen two or three nested interrupts, to which handlers ...

In general, you understand. Trying to count the stack for a specific program is an exciting and generally useful exercise, but often you will not do it.

Therefore, in practice, one technique is used that allows us to somehow understand whether everything in our life is going well - the so-called “memory painting” (memory painting).

What is convenient in this method is that it does not depend on the debugging tools you use, and if the system has at least some means of outputting information, it can do without debugging tools in general.

Its essence is that we fill the entire array from the end of bss to the beginning of the stack somewhere at the very early stage of program execution, when the stack is still exactly small, with the same value.

Further, checking at which address this value has already disappeared, we understand where the stack was going. Since once the erased color itself will not be restored, the check can be done sporadically - it will show the maximum stack size reached.

We define the color of the paint - the specific value does not matter, below I just squeaked with two fingers of my left hand. The main thing is not to choose 0 and FF:

#define STACK_CANARY_WORD (0xCACACACAUL)

At the very beginning of the program, right in the startup file, let's fill in all the free memory with this paint:

 volatile unsigned *top, *start; __asm__ volatile ("mov %[top], sp" : [top] "=r" (top) : : ); start = &_ebss; while (start < top) { *(start++) = STACK_CANARY_WORD; }

What did we do here? The assembler insert assigned the variable top to a value equal to the current address of the stack, so as not to accidentally overwrite it; in the variable start - the address of the end of the bss block (the variable in which it is stored, I spied in the linker script * .ld - in this case it is from the libopencm3 library). Then we just fill everything from the end of bss to the beginning of the stack with the same value.

After that we can do this at any time:

 unsigned check_stack_size(void) { /* top of data section */ unsigned *addr = &_ebss; /* look for the canary word till the end of RAM */ while ((addr < &_stack) && (*addr == STACK_CANARY_WORD)) { addr++; } return ((unsigned)&_stack - (unsigned)addr); }

Here, the _ebss variable is already familiar to us, and the _stack variable is from the same linker script , in it it means the top address of the stack, that is, in this case, just the end of the RAM.

This function will return the maximum fixed stack size in bytes.

Further logic is quite simple - somewhere in the body of the program, periodically call check_stack_size () and output its exhaust to the console, to the screen or where it is convenient for us to bring it out, and launch the device into operation for a period that we consider to be quite long.

Periodically we look at the stack size.

In this case, with various chaotic actions with the device, it is possible to bring it to 712 bytes - that is, of the 6 Kbytes of RAM that were available initially, we still have a reserve of as many as 108 bytes.

Word of caution

The experimental method for determining the size of the stack is simple, effective, but not 100% reliable. There can always be a situation where a very rare set of circumstances, observed, for example, once a year, will lead to an unplanned increase in this size. However, in the general case and with a well-written firmware it can be considered that it is unlikely that something will happen to you, overlapping the fixed size by more than 10-20%, so we are safe with our 108 bytes of stock with a high degree of confidence.

In most cases, such a quasi-profiling, which is easily and simply performed on virtually any system and regardless of the development tools used, makes it possible to determine with high confidence the efficiency of memory use and catch the problem with the stack at early stages, especially when working on younger controllers with RAM in units kilobyte

PS In multitasking systems on RTOS, in most cases there are many stacks - besides the main MSP stack growing from the top edge of the RAM down, there are separate stacks of PSP processes. Their size is clearly defined by the programmer, which does not prevent the process from reaching beyond their boundaries - therefore, the control methods they use are the same.

Source: https://habr.com/ru/post/443030/

All Articles

Simplest memory profiling on STM32 and other microcontrollers

Static and dynamic memory areas

Statically allocated memory

Dynamically allocated memory

Word of caution

More articles: