ARM assembler (continued)

Good day, habrazhiteli. Inspired by the ARM article by the assembler , I decided to continue this article for those interested and beginners like me. Based on the title, it becomes clear that before reading this article, it is advisable to read the above. So, "continue."

My case will differ from the previous one as follows:

on my ubuntu 12.04 machine
I took arm toolchain from here (choose ARM Processors - Download the GNU / Linux Release). At the time of this writing, more recent versions have appeared, but I used arm-2012.09 (arm-none-linux-gnueabi toolchain)
installed it like this:
$ mkdir ~ / toolchains
$ cd ~ / toolchains
$ tar -jxf ~ / arm-2012.09-64-arm-none-linux-gnueabi-i686-pc-linux-gnu.tar.bz2
added to simplify further actions our tulchan in PATH
$ PATH = $ HOME / toolchains / arm-2012.09 / bin: $ PATH
install qemu in ubuntu
$ sudo apt-get install qemu
$ sudo apt-get install qemu-system

In principle, there are no critical changes with regard to the case in the “parent” article.
The flash memory in which the program from the previous article was stored is a kind of EEPROM (reprogrammable ROM with electrical erasure). This is a useful "secondary" memory, usually used as a hard disk, but inconvenient for storing variables. Variables must be stored in RAM so that they can be easily changed.
The emulated QEMU package has 64 MB of RAM, starting at address 0xA000 0000, into which variables can be saved. The memory card of the emulated card can be represented in the figure.

To accommodate variables starting at this address, special measures need to be taken. To understand what needs to be done, you need to understand the role played by the linker.

Linker

During the broadcast of a program consisting of several source text files, each such file is converted into an object file. The linker merges these object files into the final executable file.

During layout, the linker performs the following operations:

Character resolution
Move

Character resolution

In the course of converting the source file to the object code, the translator replaces all references to tags with the corresponding addresses. In a multi-file program, if the module has any references to external labels defined in another file, the assembler marks them as "unresolved". When these object files are transferred to the linker, it determines the values of the addresses of such links from other object files and corrects the code for the correct values.
Consider an example that calculates the sum of the elements of an array — specifically divided into two files so that the resolution of characters performed by the linker can be clearly seen. To demonstrate the presence of unresolved links, we will collect both files and check their symbol tables.
The file sum-sub.s contains the subroutine sum, and the file main.s calls the subroutine with the required arguments. The source files are listed below.
')
main.s
.text
b start
arr: .byte 10, 20, 25 @ ( )
eoa: @ + 1
.align
start:
ldr r0, =arr @ r0 = &arr
ldr r1, =eoa @ r1 = &eoa
bl sum @ sum
stop: b stop

sum-sub.s
@
@ r0:
@ r1:

@
@ r3:

.global sum
sum: mov r3, #0 @ r3 = 0
loop: ldrb r2, [r0], #1 @ r2 = *r0++;
add r3, r2, r3 @ r3 += r2;
cmp r0, r1 @ if (r0 != r1);
bne loop @ , «goto loop» 86
mov pc, lr @ pc = lr;

Using the .global directive .global we set the visibility of variables declared in the function for other files. Compile the files and view the symbol table using the nm command.

$ arm-none-linux-gnueabi-nm main.o
00000004 t arr
00000007 t eoa
00000008 t start
00000014 t stop
U sum

$ arm-none-linux-gnueabi-nm sum-sub.o
00000004 t loop
00000000 T sum

The single letter in the second column identifies the type of character. The type “t” means that the symbol is defined in the .text section. The type "u" specifies that the character is not defined. Capital letter identifies the type of access .global. Obviously, the sum symbol is defined in sum-sub.o and is not described in main.o, with the expectation that the linker will later convert symbolic links and create an executable file on output.

Move

Relocation is the process of changing the address already given to the tag previously, as well as fixing all the links to reflect the newly assigned addresses. First of all, the movement is carried out for the following two reasons:

Merge sections
Placing sections in executable file

To understand the process of moving it is important to understand what sections are.
At the time of program execution, the code and data can be processed differently: if the code can be placed in ROM (read-only memory), then the data may require both reading from memory and writing. It is most convenient if the code and data do not alternate, and that is why the programs are divided into sections. Most programs have at least two sections: .text for code and .data for working with data. To switch between the two sections, assembler directives .text and .data are used.
When an assembler encounters a section directive, it puts the code or data that follows it into the corresponding memory area. Thus, the code and data that belong to the same section are in adjacent cells. The process is illustrated in the following figure.

Merge sections

In multi-file programs, sections with the same name (for example .text) may appear in different files. The linker is responsible for merging sections from input files in the output file section. By default, sections with the same name from each file are placed in order, and references to tags are adjusted by the value of the new address.
The result of the merging sections can be observed using the symbol table of the object files and the corresponding executable file. Below the result of the merge is shown on the example of the program for calculating the sum of an array:

$ arm-none-linux-gnueabi-ld -Ttext = 0x0 -o sum.elf main.o sum-sub.o
$ arm-none-linux-gnueabi-nm sum.elf
00000004 t arr
...
00000007 t eoa
00000024 t loop
00000008 t start
U _start
00000014 t stop
00000020 T sum

The loop symbol has the address 0x4 in the sum-sub.o file and 0x24 in sum.elf, since the .text section of the sum-sub.o file has moved and is located immediately after the .text section of the main.o file.

Placing sections in executable file

When the program is compiled, it is assumed that each section starts at address 0, and the labels are assigned values relative to the beginning of the section. When an executable file is created, sections are placed at a certain address X, and then references to tags defined in the section are increased by the value of X.
Each section is placed in a specific memory area and all references to tags in the section are corrected by the linker.
The result of placing the sections can be observed from the symbol tables of the object and executable files. For better understanding, let's place the .text section at 0x100. As a result, the address of the .text section will be 100 more in the executable file. The process of merging (section merging) and placing (section placement) sections is shown in the diagram.

Linker Script Files

As mentioned earlier, the merger and placement of sections is performed by the linker. How sections are combined and in which area of memory can be managed through the linker script file. Below is an example of a very simple script, the key points of which are labeled with digital labels.
SECTIONS { ❶
. = 0x00000000; ❷
.text : { ❸
abc.o (.text);
def.o (.text);
} ❹
}
❶ SECTIONS is the most important linker team, it determines how the sections will be merged and where they should be placed.
❷ In the block following the SECTIONS command, the number is indicated - the location counter. By default, the location is always initialized to 0x0, but by specifying a different value, you can change the initialization. In this case, setting our values to zero is an unnecessary action.
❸-❹ This part of the script determines that the .text sections from the abc.o and def.o source files should go to the .text section of the output file.
Linker scripts can be further simplified and summarized by the introduction of force "*" instead of specifying file names:
SECTIONS {
. = 0x00000000;
.text : { * (.text); }
}
If the program contains both .text and .data sections, then the union and placement of the .data section can be performed as shown below:
SECTIONS {
. = 0x00000000;
.text : { * (.text); }

. = 0x00000400;
.data : { * (.data); }
}
Here the .text section is located at 0x0, and the .data section is located at 0x400. If no values are assigned to the location counter, then the sections will be placed in the adjacent memory areas.

Example Linker Script

To demonstrate the use of linker scripts, apply our latest script to control the layout of the .text and .data program sections. For this, we modify the version of the program to calculate the sum of the elements of the array.
.data
arr: .byte 10, 20, 25
eoa:

.text
start:
ldr r0, =eoa @ r0 = &eoa
ldr r1, =arr @ r1 = &arr
mov r3, #0 @ r3 = 0
loop:
ldrb r2, [r1], #1 @ r2 = *r1++
add r3, r2, r3 @ r3 += r2
cmp r1, r0 @ if (r1 != r2)
bne loop @ goto loop
stop: b stop

As you can see, the array is now located in the .data section. Instructions for jumping over the data are no longer needed, since the script correctly places the sections.
When the program is built, the script is passed as input to the linker:

$ arm-none-linux-gnueabi-as -o sum-data.o sum-data.s
$ arm-none-linux-gnueabi-ld -T sum-data.lds -o sum-data.elf sum-data.o

The "-T sum-data.lds" parameter specifies the file sum-data.lds as a linker script. The placement of sections in the memory, as usual, can be traced by the symbol table:

$ arm-none-linux-gnueabi-nm -n sum-data.elf

00000000 t start
0000000c t loop
0000001c t stop
00000400 d arr
00000403 d eoa

As you can see, the .text section is located at address 0x0, and the .data section is located at 0x400.

Since this is my first post, I wouldn’t like to ship heavily and make it huge. Therefore, at this stage I will finish. If it is interesting and there will be requests, I will continue this article with a new one, in which I will touch on such issues as

more detailed consideration of assembler directives (of course, useful)
working with RAM
interrupt handling
running code written in a higher level language on an ARM processor

Source: https://habr.com/ru/post/188712/

All Articles