Memory profiling on STM32 and other microcontrollers: static stack size analysis

Hi, Habr!

In the last article , I also mentioned it myself, and asked in the comments - ok, well, we chose the size of the stack using a scientific method, didn’t seem to drop anything, but can we somehow more reliably assess what it is and who in general devoured so much?

The answer is short: yes, but no.
')
No, using static analysis methods it is impossible to accurately measure the size of the stack required by a program — but, nevertheless, these methods can be useful.

The answer is a little longer - under the cut.

As is widely known to a narrow circle of people, space in the stack is allocated, in fact, for local variables that are used by the currently executing function — with the exception of variables with the static modifier that are stored in statically allocated memory in the bss area, because they must save its values between function calls.

When executing a function, the compiler adds space in the stack for the variables it needs, and when it completes, it frees this space back. It would seem that everything is simple, but - and this is very fat but - we have a few problems:

functions call other functions inside that need a stack too
Sometimes functions call other functions not by their direct mention, but by a pointer to a function.
in principle, it is possible - although it should be avoided by all means - a recursive function call, when A calls for B, B calls for C, and C inside calls for A again
at any time an interrupt can occur, the handler of which is the same function that wants its own piece of stack
if you have an interrupt hierarchy, another interrupt may occur inside the interrupt!

Definitely, recursive function calls should be crossed out of this list, because their presence is an excuse not to count the stack size, and go express your opinion to the author of the code. Everything else, alas, cannot be crossed out in general (although there may be nuances in private ones: for example, you can have all interruptions have the same priority by design, for example, as in RIOT OS, and there will be no nested interrupts).

Now imagine an oil painting:

function A, which has eaten 100 bytes per stack, calls function B, which needs 50 bytes
at the time of execution of B, A itself is obviously not completed yet, therefore its 100 bytes are not released, therefore we already have 150 bytes in the stack
function B calls function C, and it does so according to a pointer, which, depending on the program logic, can indicate half a dozen different functions that consume from 5 to 50 bytes of the stack
during C runtime, an interrupt happens with a heavy handler that works for a relatively long time and consumes 20 bytes of the stack
during interrupt processing, another interrupt occurs with a higher priority, the handler of which wants 10 bytes of the stack

In this beautiful construction, with a particularly successful coincidence of all circumstances, you will have at least five simultaneously active functions - A, B, C and two interrupt handlers. Moreover, one of them does not have a constant stack consumption, because in different passes it can be just a different function, and to understand the possibility or impossibility of overlapping interruptions, you must at least know if you have any interruptions with different priorities , and as a maximum - to understand whether they can overlap each other.

Obviously, for any automatic static code analyzer, this task is extremely close to overwhelming, and it can be performed only in the rough approximation of the estimate from above:

sum up the stacks of all interrupt handlers
sum up the stacks of functions running in the same code branch
try to find all the pointers to functions and their calls, and take for the stack size the maximum stack size among the functions to which these pointers point

In most cases, it will be, on the one hand, a greatly overestimation, and on the other, a chance to skip some particularly tricky function call through pointers.

Therefore, in the general case, we can say simply: this problem is not automatically solved . A manual solution, by a person who knows the logic of this program, requires a lot of numbers to be shoved.

Nevertheless, a static estimate of the size of the stack can be very useful when optimizing software - at least with a banal purpose to understand who eats so much and not much.

There are two extremely useful tools for this in the GNU / gcc toolchain:

flag -fstack-usage
cflow utility

If you add -fstack-usage to the gcc flags (for example, in the Makefile on the line with CFLAGS), then for each compiled file% filename% .c the compiler will create a file% filename% .su, inside which will be simple and understandable text.

Take, for example, target.su for this giant footcloth :

target.c:159:13:save_settings 8 static target.c:172:13:disable_power 8 static target.c:291:13:adc_measure_vdda 32 static target.c:255:13:adc_measure_current 24 static target.c:76:6:cpu_setup 0 static target.c:81:6:clock_setup 8 static target.c:404:6:dma1_channel1_isr 24 static target.c:434:6:adc_comp_isr 40 static target.c:767:6:systick_activity 56 static target.c:1045:6:user_activity 104 static target.c:1215:6:gpio_setup 24 static target.c:1323:6:target_console_init 8 static target.c:1332:6:led_bit 8 static target.c:1362:6:led_num 8 static

Here we see the actual consumption of the stack for each function appearing in it, from which we can draw some conclusions for ourselves - well, for example, what should we try to optimize first of all if we run into the lack of RAM.

At the same time, attention, this file does not actually provide accurate information about the actual stack consumption for functions from which other functions are called !

To understand total consumption, we need to build a tree of calls and sum up the stacks of all the functions included in each branch. You can do this, for example, with the GNU cflow utility, by hooking it into one or more files.

Exhaust, we get here an order of magnitude more spreading, give only part of it for the same target.c:

 olegart@oleg-npc /mnt/c/Users/oleg/Documents/Git/dap42 (umdk-emb) $ cflow src/stm32f042/umdk-emb/target.c adc_comp_isr() <void adc_comp_isr (void) at src/stm32f042/umdk-emb/target.c:434>: TIM_CR1() ADC_DR() ADC_ISR() DMA_CCR() GPIO_BSRR() GPIO_BRR() ADC_TR1() ADC_TR1_HT_VAL() ADC_TR1_LT_VAL() TIM_CNT() DMA_CNDTR() DIV_ROUND_CLOSEST() NVIC_ICPR() clock_setup() <void clock_setup (void) at src/stm32f042/umdk-emb/target.c:81>: rcc_clock_setup_in_hsi48_out_48mhz() crs_autotrim_usb_enable() rcc_set_usbclk_source() dma1_channel1_isr() <void dma1_channel1_isr (void) at src/stm32f042/umdk-emb/target.c:404>: DIV_ROUND_CLOSEST() gpio_setup() <void gpio_setup (void) at src/stm32f042/umdk-emb/target.c:1215>: rcc_periph_clock_enable() button_setup() <void button_setup (void) at src/stm32f042/umdk-emb/target.c:1208>: gpio_mode_setup() gpio_set_output_options() gpio_mode_setup() gpio_set() gpio_clear() rcc_peripheral_enable_clock() tim2_setup() <void tim2_setup (void) at src/stm32f042/umdk-emb/target.c:1194>: rcc_periph_clock_enable() rcc_periph_reset_pulse() timer_set_mode() timer_set_period() timer_set_prescaler() timer_set_clock_division() timer_set_master_mode() adc_setup_common() <void adc_setup_common (void) at src/stm32f042/umdk-emb/target.c:198>: rcc_periph_clock_enable() gpio_mode_setup() adc_set_clk_source() adc_calibrate() adc_set_operation_mode() adc_disable_discontinuous_mode() adc_enable_external_trigger_regular() ADC_CFGR1_EXTSEL_VAL() adc_set_right_aligned() adc_disable_temperature_sensor() adc_disable_dma() adc_set_resolution() adc_disable_eoc_interrupt() nvic_set_priority() nvic_enable_irq() dma_channel_reset() dma_set_priority() dma_set_memory_size() dma_set_peripheral_size() dma_enable_memory_increment_mode() dma_disable_peripheral_increment_mode() dma_enable_transfer_complete_interrupt() dma_enable_half_transfer_interrupt() dma_set_read_from_peripheral() dma_set_peripheral_address() dma_set_memory_address() dma_enable_circular_mode() ADC_CFGR1() memcpy() console_reconfigure() tic33m_init() strlen() tic33m_display_string()

And it's not even half a tree.

To understand the actual consumption of the stack, we need to take the consumption for each of the functions mentioned in it and sum up these values for each branch.

And at the same time, we still do not take into account function calls by pointers and interrupts, incl. nested (and specifically in this code, they can be nested).

As you might guess, doing this with every change of code is, to put it mildly, difficult - that’s why no one usually does this.

Nevertheless, it is necessary to understand the principles of filling the stack - this can give rise to certain restrictions on the project code, increasing its reliability from the point of view of preventing stack overflow (for example, prohibiting nested interrupts or function calls on pointers), and specifically -fstack-usage can greatly help with code optimization on systems with low RAM.

Source: https://habr.com/ru/post/443544/

All Articles

Memory profiling on STM32 and other microcontrollers: static stack size analysis

More articles: