The most frequent rake when using printf in programs for microcontrollers

From time to time in my projects I have to use printf in conjunction with a serial port (UART or an abstraction over USB that simulates a serial port). And, as usual, the time between its applications passes a lot and I manage to completely forget all the nuances that need to be taken into account in order for it to work normally in a large project.

In this article, I gathered my own top nuances that arise when using printf in microcontroller programs, sorted by most obvious to completely non-obvious.

Brief introduction

In fact, in order to use printf in programs for microcontrollers, it is enough:

connect the header file in the project code;
override the system function _write to output to the serial port;
describe the system call stubs that the linker requires (_fork, _wait, etc.);
use printf call in the project.

In fact, not everything is so simple.

You need to describe all the stubs, not just those used.

The presence of a heap of undefined references when building a project is surprising at first, but after reading a little, it becomes clear what and for what. In all my projects I connect this submodule . Thus, in the main project, I override only the methods I need (only _write in this case), and the rest remain unchanged.
')
It is important to note that all stubs must be C functions. Not C ++ (or wrapped in extern "C"). Otherwise, the layout will fail (remember about changing names when building in G ++).

In _write comes on 1 character

Despite the fact that in the prototype of the _write method there is an argument that passes the length of the output message, it has a value of 1 (in fact, we ourselves will make it always 1, but more on that later).

int _write (int file, char *data, int len) { ... }

On the Internet you can often see the following implementation of this method:

Frequent implementation of the _write function

 int uart_putc( const char ch) { while (USART_GetFlagStatus(USART2, USART_FLAG_TC) == RESET); {} USART_SendData(USART2, (uint8_t) ch); return 0; } int _write_r (struct _reent *r, int file, char * ptr, int len) { r = r; file = file; ptr = ptr; #if 0 int index; /* For example, output string by UART */ for(index=0; index<len; index++) { if (ptr[index] == '\n') { uart_putc('\r'); } uart_putc(ptr[index]); } #endif return len; }

This implementation has the following disadvantages:

low productivity;
streaming insecurity;
the inability to use the serial port for other purposes;

Poor performance

Poor performance is explained by sending bytes using processor resources: you have to keep track of the status register instead of using the same DMA. To solve this problem, you can pre-prepare the buffer for sending, and upon receipt of the character of the end of the line (or filling the buffer) to send. This method requires a buffer memory, but significantly improves the performance with frequent sending.

An example implementation of _write with buffer

 #include "uart.h" #include <errno.h> #include <sys/unistd.h> extern mc::uart uart_1; extern "C" { //      uart. static const uint32_t buf_size = 254; static uint8_t tx_buf[buf_size] = {0}; static uint32_t buf_p = 0; static inline int _add_char (char data) { tx_buf[buf_p++] = data; if (buf_p >= buf_size) { if (uart_1.tx(tx_buf, buf_p, 100) != mc_interfaces::res::ok) { errno = EIO; return -1; } buf_p = 0; } return 0; } // Putty  \r\n    //    . static inline int _add_endl () { if (_add_char('\r') != 0) { return -1; } if (_add_char('\n') != 0) { return -1; } uint32_t len = buf_p; buf_p = 0; if (uart_1.tx(tx_buf, len, 100) != mc_interfaces::res::ok) { errno = EIO; return -1; } return 0; } int _write (int file, char *data, int len) { len = len; //   . if ((file != STDOUT_FILENO) && (file != STDERR_FILENO)) { errno = EBADF; return -1; } //     //   \n. if (*data != '\n') { if (_add_char(*data) != 0) { return -1; } } else { if (_add_endl() != 0) { return -1; } } return 1; } }

Here, an object of the class uart - uart_1 is responsible for sending directly using dma. The object uses FreeRTOS methods to block third-party access to the object at the time of sending data from the buffer (taking and returning mutex). Thus, no one can use the uart object while sending from another thread.
A few links:

_write function code as part of a real project here
uart class interface here
implementation of the interface of the uart class under stm32f4 here and here
creating an instance of the uart class in the project here

Stream insecurity

This implementation also remains unprotected by the stream, since no one interferes with the next FreeRTOS stream to start sending another line to printf and thereby rub the buffer that is being sent (mutex inside uart protects the object from being used in different streams, but not transmitted data ). In case there is a risk that printf of another stream will be called, then it is required to implement a layer object that will block access to printf entirely. In my particular case, only one stream interacts with printf, so additional complications will only decrease performance (constant capture and release of mutex inside the layer).

Inability to use the serial port for other purposes

Since we are sending only after the entire string has been received (or the buffer is full), instead of the uart object, you can call the converter method to some top-level interface for subsequent packet forwarding (for example, delivery with a guarantee according to the transmission protocol similar to transaction modbus). This will allow you to use one uart both for displaying debug information and, for example, for user interaction with the control console (if it is available in the device). It will be enough to write a decompressor on the recipient side.

The default output does not work float

If you use newlib-nano, then by default printf (as well as all their derivatives like sprintf / snprintf ... and others) do not support output of float values. This is easily solved by adding the following compositor flags to the project.

 SET(LD_FLAGS -Wl,-u,vfprintf; -Wl,-u,_printf_float; -Wl,-u,_scanf_float; "_")

View the full list of flags here .

The program hangs somewhere in the depths of printf

This is another flaw in the linker flags. For the firmware to be linked with the required version of the library, you must explicitly specify the processor settings.

 SET(HARDWARE_FLAGS -mthumb; -mcpu=cortex-m4; -mfloat-abi=hard; -mfpu=fpv4-sp-d16;) SET(LD_FLAGS ${HARDWARE_FLAGS} "_")

View the full list of flags can also be here .

printf causes the microcontroller to get into the hard fault

There may be at least two reasons:

stack problems;
problems with _sbrk;

Stack problems

This problem really manifests itself when using FreeRTOS or any other OS. The problem is using the buffer. In the first paragraph it was said that in _write comes 1 byte. In order for this to happen, it is necessary in the code, before the first use of printf, to prohibit the use of buffering.

 setvbuf(stdin, NULL, _IONBF, 0); setvbuf(stdout, NULL, _IONBF, 0); setvbuf(stderr, NULL, _IONBF, 0);

It follows from the description of the function that one can also set one of the following values:

 #define _IOFBF 0 /* setvbuf should set fully buffered */ #define _IOLBF 1 /* setvbuf should set line buffered */ #define _IONBF 2 /* setvbuf should set unbuffered */

However, this can lead to an overflow of the task stack (or interrupts, if you are suddenly a very bad person who calls printf from interrupts).

Technically, it is possible to arrange stacks for each thread very carefully, but this approach requires careful planning and it is difficult to catch the mistakes it carries. A much simpler solution is to receive one byte each, storing it in its own buffer and then output it in the required format, which was previously parsed.

Problems with _sbrk

This problem was personally for me the most implicit. And so, what do we know about _sbrk?

another stub that you want to implement to support a large part of the standard libraries;
required for heap memory allocation;
used by all sorts of library methods like malloc, free.

Personally, in my projects in 95% of cases I use FreeRTOS with overridden new / delete / malloc methods using a bunch of FreeRTOS. So when I allocate memory, I am sure that the allocation goes in the FreeRTOS heap, which occupies a predetermined amount of memory in the bss area. Look at the layer here . So, technically, there should be no problem. The function simply should not be called. But let's think, if she calls, where will she try to take the memory?

Recall the layout of the RAM of the “classic” project for microcontrollers:

.data;
.bss;
empty space;
initial stack

In data, we have the initial data of global objects (variables, structures, and other global project fields). In bss, global fields that have an initial zero value and, carefully, a bunch of FreeRTOS. It is just an array in memory. which the methods from the heap_x.c file then work on. Next comes the empty space, after which (or rather, from the end) is the stack. Because my project uses FreeRTOS, then this stack is used only until the scheduler starts. And, thus, its use, in most cases, is limited to a collobite (in fact, usually the 100 byte limit).

But where is the memory allocated with _sbrk? Let's look at what variables it uses from the linker script.

 void *__attribute__ ((weak)) _sbrk (int incr) { extern char __heap_start; extern char __heap_end; ...

Now we will find them in the linker script-e (my script is slightly different from the one provided by st, however this part is about the same there):

 __stack = ORIGIN(SRAM) + LENGTH(SRAM); __main_stack_size = 1024; __main_stack_limit = __stack - __main_stack_size; ...  flash,    ... .bss (NOLOAD) : ALIGN(4) { ... . = ALIGN(4); __bss_end = .; } >SRAM __heap_start = __bss_end; __heap_end = __main_stack_limit;

That is, it uses memory between the stack (1 kb from 0x20020000 down at 128 kb ram) and bss.

Understood. But after all, carried has redefinition methods malloc, free and others. It's not necessary to use _sbrk, is it? As it turned out, necessarily. Moreover, this method does not use printf, but the method for setting buffering is setvbuf (or rather _malloc_r, which is declared not as a weak function in the library. Unlike malloc, which can be easily replaced).

Since I was sure that sbrk is not used, I placed a bunch of FreeRTOS (bss section) close to the stack (because I knew that the stack was used 10 times less than required).

Solutions to Problem 3:

make some indent between bss and stack;
override _malloc_r so that _sbrk is not called (separate one method from the library);
rewrite sbrk through malloc and free.

I stopped at the first option, because it is safe to replace the standard _malloc_r (which is inside libg_nano.a (lib_a-nano-mallocr.o)) I did not succeed (the method is not declared as __attribute__ ((weak)), but to exclude only a single function from the bi-library I did not succeed in making it). I really didn’t want to rewrite sbrk for the sake of one call.

The final solution was to allocate separate sections in RAM for the initial stack and _sbrk. This ensures that at the stage of arranging the sections do not fit on each other. Inside sbrk there is also a check for going beyond the section. It was necessary to make a small revision so that when detecting a transition abroad, the thread hung in the while loop (since the use of sbrk occurs only at the initial stage of initialization and must be processed during the debugging phase of the device).

Modified mem.ld

 MEMORY { FLASH (RX) : ORIGIN = 0x08000000, LENGTH = 1M CCM_SRAM (RW) : ORIGIN = 0x10000000, LENGTH = 64K SRAM (RW) : ORIGIN = 0x20000000, LENGTH = 126K SBRK_HEAP (RW) : ORIGIN = 0x2001F800, LENGTH = 1K MAIN_STACK (RW) : ORIGIN = 0x2001FC00, LENGTH = 1K }

Changes in section.ld

 __stack = ORIGIN(MAIN_STACK) + LENGTH(MAIN_STACK); __heap_start = ORIGIN(SBRK_HEAP); __heap_end = ORIGIN(SBRK_HEAP) + LENGTH(SBRK_HEAP);

You can look at mem.ld and section.ld in my sandbox project in this commit .

UPD 07/12/2019: corrected the list of flags for printf to work with float values. Corrected a link to the CMakeLists worker with corrected compilation and linking flags (there were nuances so that the flags should be listed one by one and through ";", while everything on one line or on different lines doesn't matter).

Source: https://habr.com/ru/post/459420/

All Articles