📜 ⬆️ ⬇️

How to make a context switch on STM32

Good day!

Flows ... Context Switching ... Basic OS Essence. And of course, when developing libraries and applications, we always rely on the fact that the implementation of threads is infallible. Therefore, it was unexpected to find a gross error in switching streams for STM32 to Embox RTOS , when the network, the file system and many third-party libraries were already working for a long time. And we even managed to brag about our achievements on Habré .

I would like to talk about how we did thread switching for the Cortex-M, and tested it on STM32. In addition, I will try to talk about how this is done in other operating systems - NuttX and FreeRTOS.

Well, first a couple of words about how the problem was discovered. At that moment I was collecting another craft - a robot with different sensors. At some point I wanted to control two stepper motors, with each controlled from a separate stream (the streams are absolutely identical). The result is that until one motor finishes the rotation, the second one does not even start.
')
I sat down to debug. It turned out that all interrupts are simply disabled in threads! You say, how could something work then? Everything is simple - there are many places where there is sleep() , mutex_lock() and other “wait”, and at the expense of them the threads naturally switched. The problem was obviously related to context switching for the STM32F4, where I discovered it.

Let's take a closer look at the problem. Switching of contexts of streams occurs including by timer, that is, by interrupts. Schematically, interrupt handling in Embox can be represented as follows:

 void irq_handler(pt_regs_t *regs) { ... int irq = get_irq_number(regs); { ipl_enable(); irq_dispatch(irq); ipl_disable(); } irqctrl_eoi(irq); ... critical_dispatch_pending(); } 

The whole point is that the irq_dispatch interrupt handler is first called, then the interrupt processing “ends”, and the context switches to another thread if the scheduler requires it inside critical_dispatch_pending . And here it is very important that the state of the processor in this thread should be the same as before it was interrupted, including enabling or disabling interrupts. For the resolution of interrupts, the bit in xPSR , which is placed on the stack by the processor itself at the time of entry into the interrupt, when it exits the interrupt it goes from the stack. The problem is that since we have preemptive multitasking, we can, having entered into an interrupt on one thread, want to exit on a stack of another thread, in which of course there is no xPSR stored. Moreover, like most OSs, we have synchronization primitives, for example, pthread_mutex_lock() , which can lead to context switching not from an interrupt. In general, we began to doubt whether it is possible to organize preemptive multitasking on cortex-m, because this architecture is well optimized for small tasks. But stop! But how then do other operating systems work?

Interrupt handling on Cortex-M


Let's first understand how the interrupt handling on the Cortex-M is arranged.


The picture shows stacks in two modes - with and without a floating point. When an interrupt occurs, the processor saves the corresponding registers to the stack, and places one of the following values ​​in the table below into the LR register. That is, if the interrupt is nested, then there will be 0xFFFFFFF1.



Next, the OS interrupt handler is called, at the end of which the “bx lr” is usually executed (recall that the LR contains 0xFFFFFFXX). After this, the automatically saved registers are restored, and the execution of the program continues.

Now let's look at how context switching takes place in different operating systems.

FreeRTOS


Let's start with FreeRTOS . To do this, look at the portable/GCC/ARM_CM4F/port.c Below is the function code xPortSysTickHandler :

xPortSysTickHandler
 void xPortSysTickHandler( void ) { /* The SysTick runs at the lowest interrupt priority, so when this interrupt executes all interrupts must be unmasked. There is therefore no need to save and then restore the interrupt mask value as its value is already known. */ portDISABLE_INTERRUPTS(); { /* Increment the RTOS tick. */ if( xTaskIncrementTick() != pdFALSE ) { /* A context switch is required. Context switching is performed in the PendSV interrupt. Pend the PendSV interrupt. */ portNVIC_INT_CTRL_REG = portNVIC_PENDSVSET_BIT; } } portENABLE_INTERRUPTS(); } 


This is a hardware timer handler. Here we see that if you need to do a context switch, then some PendSV interrupt is triggered. As the documentation says - “PendSV is an interrupt-driven request for system-level service. For example, the situation is active. ”Inside the xPortPendSVHandler interrupt handler, the context switches:

xPortPendSVHandler
 void xPortPendSVHandler( void ) { /* This is a naked function. */ __asm volatile ( " mrs r0, psp \n" " isb \n" " \n" " ldr r3, pxCurrentTCBConst \n" /* Get the location of the current TCB. */ " ldr r2, [r3] \n" " \n" " tst r14, #0x10 \n" /* Is the task using the FPU context? If so, push high vfp registers. */ " it eq \n" " vstmdbeq r0!, {s16-s31} \n" " \n" " stmdb r0!, {r4-r11, r14} \n" /* Save the core registers. */ " \n" " str r0, [r2] \n" /* Save the new top of stack into the first member of the TCB. */ " \n" " stmdb sp!, {r3} \n" " mov r0, %0 \n" " msr basepri, r0 \n" " dsb \n" " isb \n" " bl vTaskSwitchContext \n" " mov r0, #0 \n" " msr basepri, r0 \n" " ldmia sp!, {r3} \n" " \n" " ldr r1, [r3] \n" /* The first item in pxCurrentTCB is the task top of stack. */ " ldr r0, [r1] \n" " \n" " ldmia r0!, {r4-r11, r14} \n" /* Pop the core registers. */ " \n" " tst r14, #0x10 \n" /* Is the task using the FPU context? If so, pop the high vfp registers too. */ " it eq \n" " vstmdbeq r0!, {s16-s31} \n" " \n" " stmdb r0!, {r4-r11, r14} \n" /* Save the core registers. */ " \n" " str r0, [r2] \n" /* Save the new top of stack into the first member of the TCB. */ " \n" " stmdb sp!, {r3} \n" " mov r0, %0 \n" " msr basepri, r0 \n" " dsb \n" " isb \n" " bl vTaskSwitchContext \n" " mov r0, #0 \n" " msr basepri, r0 \n" " ldmia sp!, {r3} \n" " \n" " ldr r1, [r3] \n" /* The first item in pxCurrentTCB is the task top of stack. */ " ldr r0, [r1] \n" " \n" " ldmia r0!, {r4-r11, r14} \n" /* Pop the core registers. */ " \n" " tst r14, #0x10 \n" /* Is the task using the FPU context? If so, pop the high vfp registers too. */ " it eq \n" " vldmiaeq r0!, {s16-s31} \n" " \n" " msr psp, r0 \n" " isb \n" " \n" #ifdef WORKAROUND_PMU_CM001 /* XMC4000 specific errata workaround. */ #if WORKAROUND_PMU_CM001 == 1 " push { r14 } \n" " pop { pc } \n" #endif #endif " \n" " bx r14 \n" " \n" " .align 4 \n" "pxCurrentTCBConst: .word pxCurrentTCB \n" ::"i"(configMAX_SYSCALL_INTERRUPT_PRIORITY) ); } 


But now let's imagine that we are switching to a new thread that will execute, say, a certain function fn . That is, if we simply put the address of the fn function in the PC , we will immediately get to the right place, but with the wrong context - we did not leave the interruption! FreeRTOS offers the following solution. Let's initially initialize the thread being created as if we were going out of the interrupt - /* Simulate the stack frame as it would be created by a context switch interrupt. */ /* Simulate the stack frame as it would be created by a context switch interrupt. */ . In this case, we will first “ xPortPendSVHandler handler, that is, we will be in the right context, after which, following the prepared stack, we end up in fn . Below is the code for this stream preparation:

pxPortInitialiseStack
 StackType_t *pxPortInitialiseStack( StackType_t *pxTopOfStack, TaskFunction_t pxCode, void *pvParameters ) { /* Simulate the stack frame as it would be created by a context switch interrupt. */ /* Offset added to account for the way the MCU uses the stack on entry/exit of interrupts, and to ensure alignment. */ pxTopOfStack--; *pxTopOfStack = portINITIAL_XPSR; /* xPSR */ pxTopOfStack--; *pxTopOfStack = ( ( StackType_t ) pxCode ) & portSTART_ADDRESS_MASK; /* PC */ pxTopOfStack--; *pxTopOfStack = ( StackType_t ) portTASK_RETURN_ADDRESS; /* LR */ /* Save code space by skipping register initialisation. */ pxTopOfStack -= 5; /* R12, R3, R2 and R1. */ *pxTopOfStack = ( StackType_t ) pvParameters; /* R0 */ /* A save method is being used that requires each task to maintain its own exec return value. */ pxTopOfStack--; *pxTopOfStack = portINITIAL_EXEC_RETURN; pxTopOfStack -= 8; /* R11, R10, R9, R8, R7, R6, R5 and R4. */ return pxTopOfStack; } 


So, this was one of the ways suggested by FreeRTOS.

Nuttx


Let's now look at another method suggested by NuttX . This is another relative known OS for various small glands.

The main part of the interrupt processing occurs inside the up_doirq function; this is essentially a second-level interrupt handler, called from the assembler code. It is the decision whether to switch to another stream. This function will return the necessary context of the new thread.

up_doirq
 uint32_t *up_doirq(int irq, uint32_t *regs) { board_autoled_on(LED_INIRQ); #ifdef CONFIG_SUPPRESS_INTERRUPTS PANIC(); #else uint32_t *savestate; /* Nested interrupts are not supported in this implementation. If you want * to implement nested interrupts, you would have to (1) change the way that * CURRENT_REGS is handled and (2) the design associated with * CONFIG_ARCH_INTERRUPTSTACK. The savestate variable will not work for * that purpose as implemented here because only the outermost nested * interrupt can result in a context switch. */ /* Current regs non-zero indicates that we are processing an interrupt; * CURRENT_REGS is also used to manage interrupt level context switches. */ savestate = (uint32_t *)CURRENT_REGS; CURRENT_REGS = regs; /* Acknowledge the interrupt */ up_ack_irq(irq); /* Deliver the IRQ */ irq_dispatch(irq, regs); /* If a context switch occurred while processing the interrupt then * CURRENT_REGS may have change value. If we return any value different * from the input regs, then the lower level will know that a context * switch occurred during interrupt processing. */ regs = (uint32_t *)CURRENT_REGS; /* Restore the previous value of CURRENT_REGS. NULL would indicate that * we are no longer in an interrupt handler. It will be non-NULL if we * are returning from a nested interrupt. */ CURRENT_REGS = savestate; #endif board_autoled_off(LED_INIRQ); return regs; } 


After returning from the function, we again find ourselves in the first level handler. And if you need to switch to a new thread, then we modify the registers automatically saved on entry into the interrupt so that, when the interrupt processing is completed, it will fall into the necessary stream. Below is a snippet of code.

  bl up_doirq /* R0=IRQ, R1=register save (msp) */ mov r1, r4 /* Recover R1=main stack pointer */ /* On return from up_doirq, R0 will hold a pointer to register context * array to use for the interrupt return. If that return value is the same * as current stack pointer, then things are relatively easy. */ cmp r0, r1 /* Context switch? */ beq l2 /* Branch if no context switch */ //   … /* We are returning with a pending context switch. This case is different * because in this case, the register save structure does not lie in the * stack but, rather, within a TCB structure. We'll have to copy some * values to the stack. */ add r1, r0, #SW_XCPT_SIZE /* R1=Address of HW save area in reg array */ ldmia r1, {r4-r11} /* Fetch eight registers in HW save area */ ldr r1, [r0, #(4*REG_SP)] /* R1=Value of SP before interrupt */ stmdb r1!, {r4-r11} /* Store eight registers in HW save area */ #ifdef CONFIG_BUILD_PROTECTED ldmia r0, {r2-r11,r14} /* Recover R4-R11, r14 + 2 temp values */ #else ldmia r0, {r2-r11} /* Recover R4-R11 + 2 temp values */ #endif … 

That is, in Nuttx (as opposed to FreeRTOS), the register values ​​automatically saved on the stack are already modified. This is perhaps the main difference. In addition, you can see that they do well without PendSV (although ARM recommends :)). And finally, the very switching of contexts in them is deferred, going through the interrupt stack, and not according to the principle - “they saved the old values ​​and immediately loaded them into new registers”.

Embox


Finally, how it is done in Embox. The main idea is to add some additional function (let's call it __irq_trampoline ), in which the context switching is done “in the usual mode” and not in the interrupt handling mode, and after that you really get out of the interrupt handler. That is, in other words, we tried to fully preserve the logic described at the beginning of the article:

 void irq_handler(pt_regs_t *regs) { ... int irq = get_irq_number(regs); { ipl_enable(); irq_dispatch(irq); ipl_disable(); } irqctrl_eoi(irq); //      ,     ... } 

To begin with, I will give a picture that shows the whole picture. And then I will explain in parts what is what.



How it's done? The idea is as follows. The interrupt handler is first executed in the usual way, as on other platforms. But when we exit the handler, we actually modify the stack and exit to a completely different place - at __pending_handle ! When this happens, it is as if the interrupt actually happened at the input of the __pending_handle function. Below is the code that modifies the stack to go into __pending_handle . I tried to write comments to especially important places in Russian.

 //        struct cpu_saved_ctx { uint32_t r[5]; uint32_t lr; uint32_t pc; uint32_t psr; }; void interrupt_handle(struct context *regs) { uint32_t source; struct irq_saved_state state; struct cpu_saved_ctx *ctx; ... //    ,  state.sp = regs->sp; state.lr = regs->lr; assert(!interrupted_from_fpu_mode(state.lr)); ctx = (struct cpu_saved_ctx*) state.sp; memcpy(&state.ctx, ctx, sizeof *ctx); //        /* It does not matter what value of psr is, just set up sime correct value. * This value is only used to go further, after return from interrupt_handle. * 0x01000000 is a default value of psr and (ctx->psr & 0xFF) is irq number if any. */ ctx->psr = 0x01000000 | (ctx->psr & 0xFF); ctx->r[0] = (uint32_t) &state; // we want pass the state to __pending_handle() ctx->r[1] = (uint32_t) regs; // we want pass the registers to __pending_handle() ctx->lr = (uint32_t) __pending_handle; ctx->pc = ctx->lr; /* Now return from interrupt context into __pending_handle */ __irq_trampoline(state.sp, state.lr); } 

We also give the function code __irq_trampoline . The comments to the function indicate a pro with SP, but in order not to overload the article I skip it. The main thing is “bx r1” at the end of the function. I recall that in the register r1 is the second argument of the function __irq_trampoline . If we look at the code above, we will see the call “ __irq_trampoline(state.sp, state.lr) ”, which means that the register r1 contains the value state.lr, which is equal to the value 0xFFFFFXX (see the first section)

__irq_trampoline
 .global __irq_trampoline __irq_trampoline: cpsid i # r0 contains SP stored on interrupt handler entry. So we keep some data # behind SP for a while, but interrupts are disabled by 'cpsid i' mov sp, r0 # Return from interrupt handling to usual mode bx r1 


In short, after exiting the __irq_trampoline function __irq_trampoline we unwind along the stack, exit the interrupt, and fall into __pending_handle . In this function, we do all the remaining operations (such as the context switch). In this case, when exiting this function, we need to return to the stack the initially saved values ​​of the registers, after which we enter the interrupt again and exit it, but in the original place! For this is done the next thing. We first prepare the stack, then initiate the PendSV interrupt, and then find __pendsv_handle in the __pendsv_handle handler. And then in the usual way we honestly exit the handler, but already in the original old stack. The function code __pending_handle and __pendsv_handle shown below:

__pending_handle and __pendsv_handle
 .global __pending_handle __pending_handle: //     “” ,     //  -,       . # Push initial saved context (state.ctx) on top of the stack add r0, #32 ldmdb r0, {r4 - r11} push {r4 - r11} //    .      , //      , . ... cpsie i //    ,   bl critical_dispatch_pending cpsid i # Generate PendSV interrupt //    PendSV,    bl nvic_set_pendsv cpsie i # DO NOT RETURN 1: b 1 .global __pendsv_handle __pendsv_handle: # 32 == sizeof (struct cpu_saved_ctx) add sp, #32 # Return to the place we were interrupted at, # ie before interrupt_handle_enter bx r14 


In conclusion, I will say a couple of phrases about the reviewed versions of the context_switch implementation. Each of the considered methods is working, has its advantages and disadvantages. The FreeRTOS variant is not very suitable for us, since this OS is primarily aimed at microcontrollers, which entails a certain “hard work” context_switch for a specific chip. And we are trying to offer in our OS even for microcontrollers to use the principles of a “big” OS, with all the consequences ... Approximately the same approach is in NuttX, and maybe we can either implement a similar approach or improve ours using the idea of ​​modifying the stack. But at the moment our version copes with its tasks, which can be seen if you take the code from the repository .

Source: https://habr.com/ru/post/330236/


All Articles