📜 ⬆️ ⬇️

Create an ELF file with debug information (DWARF) manually (for ARM microcontrollers)

Introduction


Recently, I became interested in microcontrollers. First AVR, then ARM. There are two main options for programming microcontrollers: assembler and C. However, I am a fan of the Fort programming language and started porting it to these microcontrollers. Of course, there are ready-made solutions, but none of them contained what I wanted: debugging with gdb. And I set out to fill this gap (so far only for ARM). I had a stm32vldiscovery board with a 32-bit ARM Cortex-M3 processor, 128kB flash and 8 kB RAM, so I started with it.
I wrote Fort cross-translator of course on Forte, and the code in the article will not be, since this language is considered exotic. I confine myself to quite detailed recommendations. There are almost no documentation and examples in the network on the subject, some parameters were selected by me through trial and error, some by analyzing the gcc compiler output files. In addition, I used only the necessary minimum of debugging information, without touching, for example, relocations and many other things. The topic is very extensive and, I confess, I only dealt with it by 30 percent, which turned out to be sufficient for me.

Anyone interested in this project can download the code here .

ELF Overview


Standard development tools compile your program into an ELF (Executable and Linkable Format) file with the ability to include debug information. The format specification can be read here . In addition, each architecture has its own characteristics, for example , ARM . Consider briefly this format.
The ELF executable file consists of the following parts:
1. Title (ELF Header)

Contains general information about the file and its main characteristics.
2. Program Header Table

This is a table of correspondence of sections of a file to memory segments, indicates to the loader, in which area of ​​memory to write each section.
3. Sections

Sections contain all the information in the file (program, data, debug information, etc.)
Each section has a type, name, and other parameters. The ".text" section usually stores code, the ".symtab" is the program symbol table (file names, procedures and variables), the ".strtab" is the string table, the sections with the ".debug_" prefix are debug information and t .d In addition, the file must necessarily have an empty section with index 0.

4. Section Header Table

This is a table containing an array of section headers.
The format is discussed in more detail in the Create ELF section.
')

DWARF Overview


DWARF is a standardized debugging information format. Standard can be downloaded from the official site . There is also a wonderful short format review: Introduction to the DWARF Debugging Format (Michael J. Eager).
Why do you need debug information? It allows you to:

This information is stored in a tree structure. Each node in the tree has a parent, may have children, and is called a DIE (Debugging Information Entry). Each node has its own tag (type) and a list of attributes (properties) describing the node. Attributes can contain anything, such as data or links to other nodes. In addition, there is information stored outside the tree.
Nodes are divided into two main types: nodes describing data, and nodes describing code.
Nodes describing the data:

  1. Data types:
    • Basic data types (node ​​type DW_TAG_base_type), for example, such as the int type in C.
    • Composite data types (pointers, etc.)
    • Arrays
    • Structures, classes, unions, interfaces

  2. Data objects:
    • constants
    • function parameters
    • variables
    • etc.


Each data object has a DW_AT_location attribute that indicates how the address where the data is located is calculated. For example, a variable may have a fixed address, be in a register or on a stack, be a member of a class or an object. This address can be calculated in a rather complicated way, therefore the standard provides for so-called Location Expressions, which may contain a sequence of statements of a special internal stack machine.

Nodes describing the code:

  1. Procedures (functions) are nodes with the DW_TAG_subprogram tag. Descendant nodes can contain descriptions of variables - function parameters and function local variables.
  2. Compilation Unit. Contains information to the program and is the parent of all other nodes.

The information described above is in the ".debug_info" and ".debug_abbrev" sections.

Other information:



ELF creation


We will create files in the EFL format using the libelf library from the elfutils package. The network has a good article on using libelf - LibELF by Example (unfortunately, the creation of files in it is described very briefly) as well as documentation .
Creating a file consists of several steps:
  1. Libelf initialization
  2. Creating a file header (ELF Header)
  3. Creating a Program Header (Program Header Table)
  4. Creating sections
  5. Write file

Consider the stages in more detail.

Libelf initialization

First you need to call the function elf_version (EV_CURRENT) and check the result. If it is equal to EV_NONE, an error has occurred and no further actions can be performed. Then we need to create the file we need on disk, get its handle and pass it to the elf_begin function:
Elf * elf_begin( int fd, Elf_Cmd cmd, Elf *elf) 


The function returns a pointer to the created descriptor that will be used in all libelf functions, 0 is returned in case of an error.

Creating a title

A new file header is created by the elf32_newehdr function:
 Elf32_Ehdr * elf32_newehdr( Elf *elf); 


Returns 0 on error or a pointer to the structure - the header of the ELF file:
 #define EI_NIDENT 16 typedef struct { unsigned char e_ident[EI_NIDENT]; Elf32_Half e_type; Elf32_Half e_machine; Elf32_Word e_version; Elf32_Addr e_entry; Elf32_Off e_phoff; Elf32_Off e_shoff; Elf32_Word e_flags; Elf32_Half e_ehsize; Elf32_Half e_phentsize; Elf32_Half e_phnum; Elf32_Half e_shentsize; Elf32_Half e_shnum; Elf32_Half e_shstrndx; } Elf32_Ehdr; 




Some of its fields are filled in the standard way, some need to be filled to us:


Creating a program header

As already mentioned, the program header (Program Header Table) is a table for matching sections of a file to memory segments, which tells the loader where to write each section. Zagovok created using the function elf32_newphdr:
 Elf32_Phdr * elf32_newphdr( Elf *elf, size_t count); 


Returns 0 on error or a pointer to the program header.
Each element in the header table is described by the following structure:
 typedef struct { Elf32_Word p_type; Elf32_Off p_offset; Elf32_Addr p_vaddr; Elf32_Addr p_paddr; Elf32_Word p_filesz; Elf32_Word p_memsz; Elf32_Word p_flags; Elf32_Word p_align; } Elf32_Phdr; 


Creating sections

After creating the headers, you can start creating sections. An empty section is created using the elf_newscn function:
 Elf_Scn * elf_newscn( Elf *elf); 


The function returns a pointer to the section or 0 on error.
After creating the section, you need to fill in the section header and create a section data descriptor.
We can get a pointer to the section header using the elf32_getshdr function:
 Elf32_Shdr * elf32_getshdr( Elf_Scn *scn); 


The section header looks like this:
 typedef struct { Elf32_Word sh_name; Elf32_Word sh_type; Elf32_Word sh_flags; Elf32_Addr sh_addr; Elf32_Off sh_offset; Elf32_Word sh_size; Elf32_Word sh_link; Elf32_Word sh_info; Elf32_Word sh_addralign; Elf32_Word sh_entsize; } Elf32_Shdr; 


After filling in the header, you need to create a section data descriptor with the elf_newdata function:
 Elf_Data * elf_newdata( Elf_Scn *scn); 


The function returns 0 on error, or a pointer to the Elf_Data structure, which will need to be filled:
 typedef struct { void* d_buf; Elf_Type d_type; size_t d_size; off_t d_off; size_t d_align; unsigned d_version; } Elf_Data; 



Special sections

For our purposes, we will need to create the minimum required set of sections:

All sections are created as described in the previous section, but each special section has its own characteristics.


Section .text

This section contains the executable code, so you need to install sh_type in SHT_PROGBITS, sh_flags in SHF_EXECINSTR + SHF_ALLOC, sh_addr - set equal to the address where this code will be loaded
Section .symtab

The section contains the description of all symbols (functions) of the program and the files in which they were described. It consists of such elements with a length of 16 bytes:
 typedef struct { Elf32_Word st_name; Elf32_Addr st_value; Elf32_Word st_size; unsigned char st_info; unsigned char st_other; Elf32_Half st_shndx; } Elf32_Sym; 


The data for the section can be collected when traversing the source text into an array, a pointer to which is then written to the section data descriptor (d_buf).
This section is created in the usual way, only sh_type needs to be set to SHT_SYMTAB, and the index of the .strtab section is written to the sh_link field, so these sections will become linked.

Section .strtab

In this section are the names of all the characters from the .symtab section. It is created as a regular section, but sh_type needs to be set to SHT_STRTAB, sh_flags to SHF_STRINGS, so this section becomes a string table.
The data for the section can be collected when traversing the source text into an array, a pointer to which is then written to the section data descriptor (d_buf).

.Shstrtab section

Section - a table of lines, contains the headers of all sections of the file, including its own title. It is created in the same way as the .strtab section. After creating its index, you need to write to the e_shstrndx field of the file header.


Row tables

The row tables contain consecutive rows ending in a zero byte, the first byte in this table must also be 0. The row index in the table is just the offset in bytes from the beginning of the table, so the first line of the 'name' has the index 1, the next line ' var 'has an index of 6.
  Index 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 
        \ 0 name \ 0 var \ 0 


Write file

So, the headers and sections are already formed, now they need to be written to a file and complete the work with libelf. The record is produced by the function elf_update:
 off_t elf_update( Elf *elf, Elf_Cmd cmd); 


The function returns -1 on error. Error text can be obtained by calling the function elf_errmsg (-1), which will return a pointer to the string with an error.
We finish work with the library with the function elf_end, with which we pass our descriptor. It remains only to close the previously opened file.
However, our generated file does not contain debugging information, which we will add in the next section.

Create DWARF


We will create debug information using the libdwarf library, complete with which is a pdf-file with documentation (libdwarf2p.1.pdf - A Producer Library Interface to DWARF).
Creating debug information consists of the following steps:
  1. Initial libdwarf producer
  2. Creating nodes (DIE - Debugging Information Entry)
  3. Creating node attributes
  4. Creating a Compilation Unit
  5. Creating Common Information Entry
  6. Creating data types
  7. Creating procedures (functions)
  8. Creating variables and constants
  9. Creating sections with debug information
  10. Finishing work with the library

Consider the stages in more detail.

Initial libdwarf producer

We will create debugging information during compilation simultaneously with the creation of symbols in the .symtab section, so the library must be initialized after libelf is initialized, the ELF header and program header are created, before sections are created.
For initialization, we will use the dwarf_producer_init_c function. There are several other initialization functions in the library (dwarf_producer_init, dwarf_producer_init_b), which differ in some of the nuances described in the documentation. In principle, you can use any of them.

 Dwarf_P_Debug dwarf_producer_init_c( Dwarf_Unsigned flags, Dwarf_Callback_Func_c func, Dwarf_Handler errhand, Dwarf_Ptr errarg, void * user_data, Dwarf_Error *error) 


The function returns Dwarf_P_Debug - a descriptor used in all subsequent functions, or -1 in case of an error, while the error will contain an error code (you can get the error message text by its code using the dwarf_errmsg function, passing this code to it)


Creating Nodes (DIE - Debugging Information Entry)

As described above, debugging information forms a tree structure. To create a node of this tree, you need:

The node is created using the dwarf_new_die function:
 Dwarf_P_Die dwarf_new_die( Dwarf_P_Debug dbg, Dwarf_Tag new_tag, Dwarf_P_Die parent, Dwarf_P_Die child, Dwarf_P_Die left_sibling, Dwarf_P_Die right_sibling, Dwarf_Error *error) 


The function returns the DW_DLV_BADADDR on error or the handle of the Dwarf_P_Die node if successful.

Creating node attributes

To create node attributes there is a whole family of functions dwarf_add_AT_xxxx. Sometimes it’s problematic to determine which function needs to create the necessary attribute, so I’ve even dug into the library source code several times. Some of the features will be described here, some below in the relevant sections. All of them accept the ownerdie parameter — the handle of the node to which the attribute will be added, and return an error code in the error parameter.
The dwarf_add_AT_name function adds a “name” attribute (DW_AT_name) to the node. Most nodes should have a name (for example, procedures, variables, constants), some names may not be (for example, the Compilation Unit)
 Dwarf_P_Attribute dwarf_add_AT_name( Dwarf_P_Die ownerdie, char *name, Dwarf_Error *error) 


Returns DW_DLV_BADADDR on error or attribute handle on success.
The functions dwarf_add_AT_signed_const, dwarf_add_AT_unsigned_const add to the node the specified attribute and its signed (unsigned) value. Character and unsigned attributes are used to set constant values, sizes, line numbers, etc. Format of functions:
 Dwarf_P_Attribute dwarf_add_AT_(un)signed_const( Dwarf_P_Debug dbg, Dwarf_P_Die ownerdie, Dwarf_Half attr, Dwarf_Signed value, Dwarf_Error *error) 


Return DW_DLV_BADADDR in the event of an error or attribute handle on successful completion.

Creating a Compilation Unit

There must be a root in any tree - we have a compilation unit that contains information about the program (for example, the name of the main file, the programming language used, the name of the compiler, the sensitivity of characters (variables, functions) to the register, the main function of the program, the starting address and. etc.) In principle, no attributes are required. For example, create information about the main file and compiler.

Main file information

To store information about the main file, the “name” attribute (DW_AT_name) is used, use the dwarf_add_AT_name function, as shown in the “Creating node attributes” section.

Compiler info

Use the dwarf_add_AT_producer function:
 Dwarf_P_Attribute dwarf_add_AT_name( Dwarf_P_Die ownerdie, char *producer_string, Dwarf_Error *error) 


Returns DW_DLV_BADADDR on error or attribute handle on success.

Creating Common Information Entry

Usually, when a function (subroutine) is called, its parameters and the return address are pushed onto the stack (although each compiler can do it in its own way), all this is called a call frame. The debugger needs information about the frame format in order to correctly determine the return address from a function and build a backtrace - a chain of function calls that led us to the current function, and the parameters of these functions. Also usually indicated processor registers, which are stored on the stack. The code that reserves space on the stack and keeps the registers of the processor is called the function prologue, the code restoring registers and the stack is called the epilogue.
This information is highly dependent on the compiler. For example, the prologue and epilogue need not necessarily be at the very beginning and end of a function; sometimes the frame is used, sometimes not; processor registers can be stored in other registers, etc.
So, the debugger needs to know how the processor registers change their value and where they will be saved when entering the procedure. This information is called Call Frame Information - information about the format of the frame. For each address in the program (containing the code), the frame's memory address (Canonical Frame Address - CFA) and information about processor registers are indicated, for example, you can specify that:

Since the information must be indicated for each address in the code, it is very voluminous and is stored in a compressed form in the .debug_frame section. Since it changes little from address to address, only its changes are encoded in the form of instructions DW_CFA_xxxx. Each instruction points to one change, for example:

Elements of the .debug_frame section are two types of records: Common Information Entry (CIE) and Frame Description Entry (FDE). The CIE contains information that is common to many FDE records, roughly speaking it describes a certain type of procedure. FDE also describes each specific procedure. When entering the procedure, the debugger first executes instructions from the CIE, and then from the FDE.
My compiler creates procedures in which the CFA is in the sp (r13) register. Create a CIE for all procedures. For this there is a function dwarf_add_frame_cie:
 Dwarf_Unsigned dwarf_add_frame_cie( Dwarf_P_Debug dbg, char *augmenter, Dwarf_Small code_align, Dwarf_Small data_align, Dwarf_Small ret_addr_reg, Dwarf_Ptr init_bytes, Dwarf_Unsigned init_bytes_len, Dwarf_Error *error); 


DW_DLV_NOCOUNT CIE, FDE , « FDE »


, , , . , ( int, double .), .
— DW_TAG_base_type. :

.
, 32- «int», DW_TAG_base_type DW_AT_name — «int», DW_AT_encoding — DW_ATE_signed, DW_AT_byte_size — 4.
. DW_AT_type — . int — DW_TAG_pointer_type DW_AT_type «int».
dwarf_add_AT_reference:
 Dwarf_P_Attribute dwarf_add_AT_reference( Dwarf_P_Debug dbg, Dwarf_P_Die ownerdie, Dwarf_Half attr, Dwarf_P_Die otherdie, Dwarf_Error *error) 




To create procedures, I need to clarify one more type of debugging information - information about line numbers (Line Number Information). It serves to compare each machine instruction to a specific line of source code, as well as to enable the program to debug the program in-line. This information is stored in the .debug_line section. If we had enough space, then it would be stored as a matrix, one line for each instruction with such columns:

, . -, , -, , . , , «» . , : DW_LNS_advance_pc — , DW_LNS_set_file — , , DW_LNS_const_add_pc — ..
, libdwarf , .
It is expensive to store the file name for each instruction, so instead of the name its index is stored in a special table. To create a file index, use the dwarf_add_file_decl function:
 Dwarf_Unsigned dwarf_add_file_decl( Dwarf_P_Debug dbg, char *name, Dwarf_Unsigned dir_idx, Dwarf_Unsigned time_mod, Dwarf_Unsigned length, Dwarf_Error *error) 


DW_DLV_NOCOUNT .
dwarf_add_line_entry_b, dwarf_lne_set_address, dwarf_lne_end_sequence, .
:



« .symtab». . , . , , .


dwarf_new_die (. « »), DW_TAG_subprogram, — Compilation Unit ( ) DIE ( ). :

DW_AT_low_pc DW_AT_high_pc dwarf_add_AT_targ_address_b:
 Dwarf_P_Attribute dwarf_add_AT_targ_address_b( Dwarf_P_Debug dbg, Dwarf_P_Die ownerdie, Dwarf_Half attr, Dwarf_Unsigned pc_value, Dwarf_Unsigned sym_index, Dwarf_Error *error) 


The function will return DW_DLV_BADADDR on error.

Creating an FDE procedure

As mentioned above in the “Creating a Common Information Entry” section, for each procedure you need to create a frame descriptor, which happens in several steps:

You can create a new FDE with the dwarf_new_fde function:
 Dwarf_P_Fde dwarf_new_fde( Dwarf_P_Debug dbg, Dwarf_Error *error) 

The function will return a handle to a new FDE or DW_DLV_BADADDR on error.
You can attach a new FDE to the list using dwarf_add_frame_fde:
 Dwarf_Unsigned dwarf_add_frame_fde( Dwarf_P_Debug dbg, Dwarf_P_Fde fde, Dwarf_P_Die die, Dwarf_Unsigned cie, Dwarf_Addr virt_addr, Dwarf_Unsigned code_len, Dwarf_Unsigned sym_idx, Dwarf_Error* error) 


The function will return DW_DLV_NOCOUNT on error.
After all this, you can add DW_CFA_xxxx instructions to our FDE. This is done by the dwarf_add_fde_inst and dwarf_fde_cfa_offset functions. The first one adds the specified instruction to the list:
 Dwarf_P_Fde dwarf_add_fde_inst( Dwarf_P_Fde fde, Dwarf_Small op, Dwarf_Unsigned val1, Dwarf_Unsigned val2, Dwarf_Error *error) 


The dwarf_fde_cfa_offset function adds the DW_CFA_offset statement:
 Dwarf_P_Fde dwarf_fde_cfa_offset( Dwarf_P_Fde fde, Dwarf_Unsigned reg, Dwarf_Signed offset, Dwarf_Error *error) 


, , lr (r14). DW_CFA_advance_loc , 1, pc 2 (. Common Information Entry, code_align), DW_CFA_def_cfa_offset 4 ( 4 ) dwarf_fde_cfa_offset reg=14 offset=1, r14 -4 CFA.


, . « »


C

:

The dwarf_lne_set_address function sets the address where the block of instructions begins:
 Dwarf_Unsigned dwarf_lne_set_address( Dwarf_P_Debug dbg, Dwarf_Addr offs, Dwarf_Unsigned symidx, Dwarf_Error *error) 


Returns 0 (success) or DW_DLV_NOCOUNT (error).
The dwarf_add_line_entry_b function adds source line information to the .debug_line section. I call this function for each machine instruction:
 Dwarf_Unsigned dwarf_add_line_entry_b( Dwarf_P_Debug dbg, Dwarf_Unsigned file_index, Dwarf_Addr code_offset, Dwarf_Unsigned lineno, Dwarf_Signed column_number, Dwarf_Bool is_source_stmt_begin, Dwarf_Bool is_basic_block_begin, Dwarf_Bool is_epilogue_begin, Dwarf_Bool is_prologue_end, Dwarf_Unsigned isa, Dwarf_Unsigned discriminator, Dwarf_Error *error) 


0 () DW_DLV_NOCOUNT ().
, dwarf_lne_end_sequence :
 Dwarf_Unsigned dwarf_lne_end_sequence( Dwarf_P_Debug dbg, Dwarf_Addr address; Dwarf_Error *error) 


0 () DW_DLV_NOCOUNT ().
.


, . , ( ), . — , — ( , ). , , .
, , . , — (location expressions). — ( DW_OP_) - , , . , :

, (dwarf_new_expr), (dwarf_add_expr_addr, dwarf_add_expr_gen .) DW_AT_location (dwarf_add_AT_location_expression).
0 :
 Dwarf_Expr dwarf_new_expr( Dwarf_P_Debug dbg, Dwarf_Error *error) 

dwarf_add_expr_gen:
 Dwarf_Unsigned dwarf_add_expr_gen( Dwarf_P_Expr expr, Dwarf_Small opcode, Dwarf_Unsigned val1, Dwarf_Unsigned val2, Dwarf_Error *error) 


The function returns DW_DLV_NOCOUNT on error.
To explicitly set the address of a variable, the dwarf_add_expr_addr function should be used instead of the previous one:
 Dwarf_Unsigned dwarf_add_expr_addr( Dwarf_P_Expr expr, Dwarf_Unsigned address, Dwarf_Signed sym_index, Dwarf_Error *error) 


The function also returns DW_DLV_NOCOUNT on error.
Finally, add the created address expression to the node by using the dwarf_add_AT_location_expr function:
 Dwarf_P_Attribute dwarf_add_AT_location_expr( Dwarf_P_Debug dbg, Dwarf_P_Die ownerdie, Dwarf_Half attr, Dwarf_P_Expr loc_expr, Dwarf_Error *error) 


DW_DLV_NOCOUNT .
( ) — DW_TAG_variable, DW_TAG_formal_parameter DW_TAG_const_type . :




elf- . :


 dwarf_transform_to_disk_form ( Dwarf_P_Debug dbg, Dwarf_Error* error) 

translates the debug information we created into binary format, but does not write anything to disk. It will return us the number of elf sections created or DW_DLV_NOCOUNT on error. At the same time, for each section, a callback function will be called, which we passed during library initialization to the dwarf_producer_init_c function. This function must be written by us. Its specification is as follows:
 typedef int (*Dwarf_Callback_Func_c)( char* name, int size, Dwarf_Unsigned type, Dwarf_Unsigned flags, Dwarf_Unsigned link, Dwarf_Unsigned info, Dwarf_Unsigned* sect_name_index, void * user_data, int* error) 


In this function, we must:

dwarf_transform_to_disk_form . 0 , :



libdwarf dwarf_producer_finish:
 Dwarf_Unsigned dwarf_producer_finish( Dwarf_P_Debug dbg, Dwarf_Error* error) 

The function returns DW_DLV_NOCOUNT on error.
I note that writing to disk at this stage is not performed. Recording needs to be done through the functions in the “Create ELF - Write File” section.


Conclusion


That's all.
I repeat, the creation of debug information is a very extensive topic, and I did not touch on many of those, just opening the curtain. Those who wish can go to infinity.
If you have questions - I will try to answer them.


Links


ELF



DWARF

Source: https://habr.com/ru/post/199490/


All Articles