šŸ“œ ā¬†ļø ā¬‡ļø

How to tame the processor core *

image This article describes the stages of loading the cores of QoriQ processors and the participation of the u-boot bootloader, as well as the execution of a single program on a separate processor core without the OS. The article may be of interest to system programmers seeking to comprehend the whole variety of processor architectures. You should also understand that some definitions and techniques are relevant for other processors and systems.

* using the example of freescale qoriq processors with e500mc and ppc booke isa cores.


To the point!


Life begins at the start of the kernel * number 0, which the loader code executes from a fixed address. In our case, the bootloader is u-boot, located in flash memory and accessible at the physical addresses 0x3ffff000. Immediately after the start, this address is displayed on the virtual 0xfffff000, about which there is an entry in the mapping table (see the documentation on e500).
*
hereinafter, the core is the core of the processor (core), unless otherwise specified.

The first command executed by the processor is the command located at 0xfffffffc. As you probably already guessed by this command, there must be a command to go to the starting point of the u-boot on this page. Something like this:
')
/* */ .section .resetvec,"ax" b _start_e500 


for those in intla
b ā‰ˆ jmp

for those in java
This is an unconditional jump command on the _start_e500 tag.


Next, u-boot tasks include enabling and configuring caches, access control mechanisms, address mapping tables and, of course, you need to map the rest of yourself to the address space of the processor core. In general, the u-boot is great: it takes all the dirty work on it, unlike some ( let's not poke a finger , it's not his fault).

But what about the rest of the kernel? Go to the rest of the nuclei, when finished with the main one. If you run them, they will repeat the sequence of actions of the zero core. To prevent this, uboot will change the translation address of the boot area to the location of the additional loader of the other cores (see ccsr boot space translation register).

By the way, in the u-boot settings there is a possibility not to initialize the work of the other processor cores, but in this case, doing the dirty work of initializing them becomes our further concern. Which, however, can be solved by copying the code from the u-boot, while the kernel itself is started by writing the corresponding bit to the ccsr register.

The tasks of the additional loader also include setting the cache and adding an entry to the address mapping table, after which the kernels spin in such a merry merry-go-round:

 /* spin waiting for addr */ 2: lwz r4,ENTRY_ADDR_LOWER(r10) /*    */ andi. r11,r4,1 /*    = 1, ...*/ bne 2b /* ... ,   2*/ 


They spin until the address of the starting point with 0 low bit is written to the specified u-boot address. The place where you need to write this command in u-boot is called a spin-table and it is located at a fixed address (0xfffff000 + ENTRY_ADDR_LOWER). In addition to the start address, in this table you can write the values ā€‹ā€‹of registers r6 and r3, which will be loaded before executing the command to go to the starting point.

The starting point is limited by the size of the already displayed u-boot 64MB page, this is due to the internal uboot cockroaches.

For those who studied in modern computer science textbooks
In PowerPC and some other architectures, the page is a loose concept. In particular, on e500 * processors it can stretch in the range from 4K to 4GB (for details and limitations, see the documentation).


Creating applications for the kernel.


Let's develop a program for our kernel, which will traditionally print ā€œhello worldā€. We assume that we have already compiled the cross compiler and the lightweight libc library, and we can load the OS without multi-core support on one of the cores (for example, you can use the appropriately compiled linux or lynxos-178 as the OS). Therefore, we proceed to the most difficult - programming:

 #include <stdio.h> int main (int argc, char *argv[]) { printf (Ā«hello world \nĀ»); return 0; } 


Is done. And where and how will printf print? To do this, you will have to write some stubs for libc, which can be found in the u-boot source. I use the simplified version:

 int write (int fildes, char *buf, int nbyte) { int wbyte = 0; while (nbyte > 0) { __putc(*buf); buf++; nbyte--; wbyte++; } return wbyte; } int fstat(int fildes, struct stat *buf); { buf->st_mode = S_IFCHR; return 0; } 


And the __putc function should provide the output of one character to the serial port.

 extern volatile unsigned char *uart_data; extern volatile unsigned char *uart_status; static void __putc(unsigned char c) { unsigned char v; do { v = *uart_status; } while (!(v & (1 << 5))); *uart_data = c; } 


It is not necessary to write a full-fledged driver for this, just use the default settings and write the symbol at the address described in the documentation. Physical address Which needs to be displayed. I will not give the display function due to its specificity, but I am ready to share it on request.

And we trim the real starting point of the executable file - the _start function. For initialization, you can leave only zeroing the bss segment:

 int _start(int argv, char **argc) { unsigned char * cp = (unsigned char *) &__sbss; while (cp < (unsigned char *)&__ebss) { *cp++ = 0; } return main(argv, argc); } 


So, now we can do without the OS and the function printf knows where to output information to it. Compile:
 $ powerpc-eabi-gcc -o hello hello.c start.c 

Will work? Not! The processor does not understand the format of the elf executable file. It is necessary to cut off the extra header and other attributes of the executable file. Crop:
 $ powerpc-eabi-objcopy -O binary hello hello.bin 

Will work? Not! Previously, the starting point was set in elf, and now she knows where the hell. A more detailed location can be viewed with the powerpc-eabi-objdump utility. Of course, you can specify u-boot as the starting point and to this place, but it is better to write instructions to the linker on the placement of the starting point at the beginning of the file:
  OUTPUT_ARCH(powerpc:common) ENTRY(_start) STARTUP(start.o) ... 

Further content of the file will depend on the version of the build tools, and you can spy it in the scripts included with the compiler.
Now, according to the script, the linker will add the whole start.o file to the beginning of the executable file. The order of the functions should correspond to the source text, but it will be quieter to leave only one function in this file - _start. In general, the option with the addition of STARTUP is a quick and narrow solution. If in the future we want more, then it will be necessary to mess around with binding functions to segments and placing them inside the script.

Putting it again and thinking how convenient it is to do it with the makefile:
 $ powerpc-eabi-gcc -T hello.ld -o hello hello.c start.c 

Now, if we have compiled everything correctly, we have a program ready to run on a separate kernel, without an OS. But itā€™s better to check everything additionally with objdump. It should be something like this:

 Disassembly of section .text: 00004000 <_start>: 4000: 3d 00 00 01 lis r8,1 4004: 3c e0 00 32 lis r7,50 4008: 39 08 70 00 addi r8,r8,28672 400c: 38 e7 80 78 addi r7,r7,-32648 4010: 39 48 ff ff addi r10,r8,-1 4014: 39 27 ff ff addi r9,r7,-1 ... 


By the way, did you notice the addresses of the segments on the side? Unless otherwise indicated, our program has position-dependent code and cannot be launched from anywhere. But now this should not worry us, and the offset can be corrected using the linker script. Correctly assembled program should run even through u-boot.

Kernel startup


From the point of view of the OS, the additional core will not differ in any way from other peripheral devices using DMA and for its operation we will need to allocate memory. The memory will be used to place our program, and in the future to exchange the results of its work, to place exception handlers, etc. The memory is allocated using the usual OS tools: kmalloc for linux, alloc_cmem for lynxos, etc., but the physical address of the beginning of this memory should aligned to page size. The general memory mapping scheme will look something like this:


If we do not want to kill the OS leg, it will be reasonable if the size of the OS allocated memory corresponds to the size of the displayed memory for another kernel. Otherwise, our kernel can write to the memory that is used by the OS for internal purposes. Here you are not a hypervisor.

So, we have allocated a memory where you can write a program. The program is recorded taking into account the offset specified by the linker, and after recording the program, do not forget to reset the cache for this section. Now you can write the physical address of the program start point into a spin-table, where the already tuned kernel is spinning.
In fact,
after recording the address, u-boot will jump to the virtual address, but since they are equal at this stage, it can be considered a miracle, but we understand how it works.

And about the limitation in the size of the starting point forgot?
You can not be sure in what range of addresses the system will have free memory and it may happen that the allocated memory will have to be on the u-boot not displayed range of addresses. This will lead to the exception of the page error, which is not configured in our country, which ultimately will lead to the shutdown of the processor core.
There may be several solutions to this, and I most liked to make my kernel initializer, located in the start addresses, in the OS, which is:

After transferring control to the console, ā€œhello world
"
In the future, using the control registers, the processor cores can be stopped, restarted, changed frequency, issued interrupts to them (this is an interesting, but very specific topic) and much more.

Conclusion


Of course, many modern operating systems provide the ability to isolate processor cores for individual applications, and the need to write your own code to support multi-core is questionable. However, there are tasks related to hard real time, for which the 2 ms latency that multicore can introduce in the standard configuration is very critical. And they require non-standard approaches to system configuration. But this is material for another article.

Source: https://habr.com/ru/post/237471/


All Articles