⬆️ ⬇️

How is ARM loaded

My previous topic was completely theoretical, the same will be practical. The practice will be pretty hardcore (I myself took up this issue only after a year of working with ARMs) —processor and memory initialization. In other words: what needs to be done with the processor to get into the main() function. The first part of the article is about building and debugging tools. The second is the processing of exception vectors, the third is the initialization of stacks and memory.

But first I want to make one clarification. For some reason, many people believe that ARM is necessarily a monster with external memory, a bunch of strapping, operating at a frequency of at least 600 Mhz, etc. This is only partly true (if we talk about ARM9 and later families). The chip that I usually work with (AT91SAM7X512) is not much more complicated than many AVRs. He needs only quartz and food for work (it is possible without quartz, but then it will be completely sad). Everything. But he has, of course, more opportunities, much more than the AVR. But more about that later. Today's article will not be tied to a specific gland.



Compilers, linkers, debuggers



A question that worries so many. There are paid (IAR, Keil MDK, CrossWorks) and free (gcc-arm). I will use gcc-arm in the examples. For Windows, there are WinARM assemblies (it seems dead), YAGARTO. In principle, you can build your own. There is still such a fun thing as coLinux, but that's another story. Under Linux, the crosscompiler is usually built using standard distribution tools. Read the docks in general :)

There is also such a useful thing as a standard library. The one that implements functions like printf, mktime, malloc and everything else that C programmers are used to. Using glibc will not work, because it is too big. Instead, they usually use the free newlib . It is part of WinARM / YAGARTO, but Linux users will have to assemble it manually. Again - read the documentation :)

With debuggers a bit more complicated. Emulators can be used, but they are pretty buggy when it comes to the periphery. I have no experience here. You can use debug messages in the COM port. I do this all my life. I have enough in 99% of cases.

But the coolest thing is JTAG. A device that connects to the processor and allows debugging the code right in the stone (set breakpoints, trace, view / change memory well, etc.). True, it costs money, on the one hand, on the other - on the board it will be necessary to plant legs under it.



Exception handlers



All right, we will assume that the compiler was installed and configured. Let's run something now. Let's start from the very beginning: what happens when the processor is reset (for example, after the power is turned on and the voltage has settled). Everything is simple: the processor starts to execute the program from the address 0x0. It would seem - you can place the initialization code from this address and work for yourself. But not everything is so simple. The point is that vectors of exception handlers are stored in the initial addresses.

For example, if an interrupt occurs, the processor will start processing it from the address 0x18, and the exception “unknown instruction” will be processed from the address 0x04. In general, the first 28 bytes are reserved for the exception handler table (reset is also an exceptional situation).

arm exception vectors

The figure shows it more clearly. From the figure, it can be seen that 4 bytes are allocated for each processor, or one processor command. (In ARM mode. All handlers are called in this instruction mode.)

Accordingly, the first thing we need to do is write exception handlers and place them correctly. And this will do:

ldr pc, ResetHandlerAddr

ldr pc, UndefHandlerAddr

ldr pc, SWIHandlerAddr

ldr pc, PrefetchAbtHandlerAddr

ldr pc, DataAbtHandlerAddr

nop

ldr pc, IRQHandlerAddr

ldr pc, FIQHandlerAddr



What does this code do? These are the load commands for registering pc addresses of real handlers. A sort of unconditional transition. Next in the code are variables that store these same addresses:



ResetHandlerAddr: .word ResetHandler

UndefHandlerAddr: .word UndefHandler

SWIHandlerAddr: .word SWIHandler

PrefetchAbtHandlerAddr: .word PrefetchAbtHandler

DataAbtHandlerAddr: .word DataAbtHandler

IRQHandlerAddr: .word IRQHandler

FIQHandlerAddr: .word FIQHandler



Here it was possible to apply several tricks that speed up interrupt processing. For example, as you can see, the FIQ handler is the most recent in the list, so the processing of this interrupt could be started right on the spot.

It was also possible to use advanced interrupt controller (AIC) registers for a direct transition to the handler of the interrupt that occurred. But for now let's not complicate our lives. For now, only Reset handling is important.

So let's write the handlers themselves as simple as possible. They will hang the processor (endlessly executing the command of unconditional transition to themselves). Anyway, we do not know yet how to handle exceptions, so a dangling processor is quite acceptable.

UndefHandler: B UndefHandler

SWIHandler: B SWIHandler

PrefetchAbtHandler: B PrefetchAbtHandler

DataAbtHandler: B DataAbtHandler

IRQHandler: B IRQHandler

FIQHandler: B FIQHandler



B is an unconditional branch command (Branch)

The next thing we need to do is to configure the stack pointers sp for each of the modes of operation. Thus, if exceptions occur, the handler will already have its own stack. Only at the beginning we describe the sizes of all the stacks.

.EQU IRQ_STACK_SIZE, 0x100

.EQU FIQ_STACK_SIZE, 0x100

.EQU ABT_STACK_SIZE, 0x100

.EQU UND_STACK_SIZE, 0x100

.EQU SVC_STACK_SIZE, 0x100



In order not to suffer for a long time, we will allocate 256 bytes per stack for each mode. In fact, for most of these modes - this is a lot. Although it all depends on the handlers. As you can see, here are the dimensions for 5 of 6 modes. The remaining memory will be shared between the heap and the stack of the sixth (user mode) mode.

Now we describe constants to facilitate the transition to different modes. The current mode is the CPSR register. It also performs the role of the status register.

.EQU ARM_MODE_FIQ, 0x11

.EQU ARM_MODE_IRQ, 0x12

.EQU ARM_MODE_SVC, 0x13

.EQU ARM_MODE_ABT, 0x17

.EQU ARM_MODE_UND, 0x1B

.EQU ARM_MODE_USR, 0x10



.EQU I_BIT, 0x80

.EQU F_BIT, 0x40



The I_BIT and F_BIT are bits that prohibit simple and fast interrupts, respectively. Now we are ready to initialize the stacks. This is done simply: we load into the register r0 pointer to the top of the stack, then we go into the desired mode, write the value r0 into sp , then reduce r0 by the size of the stack and repeat.

.RAM_TOP:

.word __TOP_STACK

ResetHandler:

ldr sp, .RAM_TOP



msr CPSR_c, #ARM_MODE_FIQ | I_BIT | F_BIT

mov sp, r0

sub r0, r0, #FIQ_STACK_SIZE



msr CPSR_c, #ARM_MODE_IRQ | I_BIT | F_BIT

mov sp, r0

sub r0, r0, #IRQ_STACK_SIZE



msr CPSR_c, #ARM_MODE_SVC | I_BIT | F_BIT

mov sp, r0

sub r0, r0, #SVC_STACK_SIZE



msr CPSR_c, #ARM_MODE_ABT | I_BIT | F_BIT

mov sp, r0

sub r0, r0, #ABT_STACK_SIZE



msr CPSR_c, #ARM_MODE_UND | I_BIT | F_BIT

mov sp, r0

sub r0, r0, #UND_STACK_SIZE



msr CPSR_c, #ARM_MODE_USR



Memory initialization



Now we are in an unprivileged mode with interrupts enabled and a configured stack. By the way, it’s impossible to get out of this mode. Only by causing an exception. But more about that in the next article.

Just a little bit left before switching to the main() function. It is only necessary to transfer some data to RAM and zero out the memory that is in the .BSS segment. This is the memory where global variables are stored. The point is that the C language standard promises that global variables will be reset to zero at the beginning of work, and ARM does not guarantee us that. Therefore, we reset the segment manually:

')

 MOV R0, #0 LDR R1, =__bss_start__ LDR R2, =__bss_end__ LoopZI: CMP R1, R2 STRLO R0, [R1], #4 BLO LoopZI 


The constants __bss_end__ & __bss_start__ kindly provided to us by the linker.

By the way, here you can observe the use of conditional instructions (with the suffix O). They will be executed until R1! = R2.

You also need to transfer pre-initialized variables (those that have int x=42 ) from ROM to RAM.

 LDR R1, =_etext LDR R2, =_data LDR R3, =_edata LoopRel: CMP R2, R3 LDRLO R0, [R1], #4 STRLO R0, [R2], #4 BLO LoopRel 


If we write in C ++, we need to also call the constructors of global objects:

 LDR r0, =__ctors_start__ LDR r1, =__ctors_end__ ctor_loop: CMP r0, r1 BEQ ctor_end LDR r2, [r0], #4 STMFD sp!, {r0-r1} MOV lr, pc BX r2 LDMFD sp!, {r0-r1} B ctor_loop ctor_end: 




Well, in general, everything. Call main() :

 ldr r0,=main bx r0 




Congratulations, we are now in the void main(void) function. You can do the initialization of the periphery. The fact is that before this we initialized only the software environment. Therefore, the processor now operates at the lowest frequency possible, all peripherals are disabled. It does not clear up :)

But the initialization of the periphery is a thing that depends on a specific piece of hardware, and the purpose of this article is to tell how to run an abstract ARM.

And a few more nuances: this code cannot be directly compiled and run, because the sections where it is located are not described here. Also, I did not provide linker scripts (these scripts describe the placement of code and data sections in memory and in the firmware image).

But the Internet is full of ready-made examples for running one or another piece of iron. With scripts, makefiles and all-all-all. Look for manufacturers on sites :)



The next article, apparently, will again be devoted to the theory, this time - to the description of processor modes and exceptional situations.

Source: https://habr.com/ru/post/87343/



All Articles