Create an ELF file with debug information (DWARF) manually (for ARM microcontrollers)

Introduction

Recently, I became interested in microcontrollers. First AVR, then ARM. There are two main options for programming microcontrollers: assembler and C. However, I am a fan of the Fort programming language and started porting it to these microcontrollers. Of course, there are ready-made solutions, but none of them contained what I wanted: debugging with gdb. And I set out to fill this gap (so far only for ARM). I had a stm32vldiscovery board with a 32-bit ARM Cortex-M3 processor, 128kB flash and 8 kB RAM, so I started with it.
I wrote Fort cross-translator of course on Forte, and the code in the article will not be, since this language is considered exotic. I confine myself to quite detailed recommendations. There are almost no documentation and examples in the network on the subject, some parameters were selected by me through trial and error, some by analyzing the gcc compiler output files. In addition, I used only the necessary minimum of debugging information, without touching, for example, relocations and many other things. The topic is very extensive and, I confess, I only dealt with it by 30 percent, which turned out to be sufficient for me.

Anyone interested in this project can download the code here .

ELF Overview

Standard development tools compile your program into an ELF (Executable and Linkable Format) file with the ability to include debug information. The format specification can be read here . In addition, each architecture has its own characteristics, for example , ARM . Consider briefly this format.
The ELF executable file consists of the following parts:

1. Title (ELF Header)

Contains general information about the file and its main characteristics.

2. Program Header Table

This is a table of correspondence of sections of a file to memory segments, indicates to the loader, in which area of memory to write each section.

3. Sections

Sections contain all the information in the file (program, data, debug information, etc.)
Each section has a type, name, and other parameters. The ".text" section usually stores code, the ".symtab" is the program symbol table (file names, procedures and variables), the ".strtab" is the string table, the sections with the ".debug_" prefix are debug information and t .d In addition, the file must necessarily have an empty section with index 0.

4. Section Header Table

This is a table containing an array of section headers.
The format is discussed in more detail in the Create ELF section.
')

DWARF Overview

DWARF is a standardized debugging information format. Standard can be downloaded from the official site . There is also a wonderful short format review: Introduction to the DWARF Debugging Format (Michael J. Eager).
Why do you need debug information? It allows you to:

set breakpoints not on the physical address, but on the line number in the source code file or on the function name
display and change values of global and local variables, as well as function parameters
show call stack (backtrace)
execute the program step by step not by one instruction of the assembler, but by the source code lines

This information is stored in a tree structure. Each node in the tree has a parent, may have children, and is called a DIE (Debugging Information Entry). Each node has its own tag (type) and a list of attributes (properties) describing the node. Attributes can contain anything, such as data or links to other nodes. In addition, there is information stored outside the tree.
Nodes are divided into two main types: nodes describing data, and nodes describing code.

Nodes describing the data:

Data types:
- Basic data types (node type DW_TAG_base_type), for example, such as the int type in C.
- Composite data types (pointers, etc.)
- Arrays
- Structures, classes, unions, interfaces
Data objects:
- constants
- function parameters
- variables
- etc.

Each data object has a DW_AT_location attribute that indicates how the address where the data is located is calculated. For example, a variable may have a fixed address, be in a register or on a stack, be a member of a class or an object. This address can be calculated in a rather complicated way, therefore the standard provides for so-called Location Expressions, which may contain a sequence of statements of a special internal stack machine.

Nodes describing the code:

Procedures (functions) are nodes with the DW_TAG_subprogram tag. Descendant nodes can contain descriptions of variables - function parameters and function local variables.
Compilation Unit. Contains information to the program and is the parent of all other nodes.

The information described above is in the ".debug_info" and ".debug_abbrev" sections.

Other information:

Line number information (".debug_line" section)
Macro Information (".debug_macinfo" section)
Call Frame Information (section ".debug_frame")

ELF creation

We will create files in the EFL format using the libelf library from the elfutils package. The network has a good article on using libelf - LibELF by Example (unfortunately, the creation of files in it is described very briefly) as well as documentation .
Creating a file consists of several steps:

Libelf initialization
Creating a file header (ELF Header)
Creating a Program Header (Program Header Table)
Creating sections
Write file

Consider the stages in more detail.

Libelf initialization

First you need to call the function elf_version (EV_CURRENT) and check the result. If it is equal to EV_NONE, an error has occurred and no further actions can be performed. Then we need to create the file we need on disk, get its handle and pass it to the elf_begin function:

Elf * elf_begin( int fd, Elf_Cmd cmd, Elf *elf)

fd - the handle of the file just opened
cmd - mode (ELF_C_READ for reading information, ELF_C_WRITE for writing or ELF_C_RDWR for reading / writing), it should correspond to the mode of the open file (ELF_C_WRITE in our case)
elf - only needed for working with archive files (.a), in our case you need to transfer 0

The function returns a pointer to the created descriptor that will be used in all libelf functions, 0 is returned in case of an error.

Creating a title

A new file header is created by the elf32_newehdr function:

 Elf32_Ehdr * elf32_newehdr( Elf *elf);

elf - the handle returned by the elf_begin function

Returns 0 on error or a pointer to the structure - the header of the ELF file:

 #define EI_NIDENT 16 typedef struct { unsigned char e_ident[EI_NIDENT]; Elf32_Half e_type; Elf32_Half e_machine; Elf32_Word e_version; Elf32_Addr e_entry; Elf32_Off e_phoff; Elf32_Off e_shoff; Elf32_Word e_flags; Elf32_Half e_ehsize; Elf32_Half e_phentsize; Elf32_Half e_phnum; Elf32_Half e_shentsize; Elf32_Half e_shnum; Elf32_Half e_shstrndx; } Elf32_Ehdr;

Some of its fields are filled in the standard way, some need to be filled to us:

e_ident is an identification byte array and has the following indices:
- EI_MAG0, EI_MAG1, EI_MAG2, EI_MAG3 - these 4 bytes must contain the characters 0x7f, 'ELF', which the function elf32_newehdr has already done for us
- EI_DATA - indicates the type of data encoding in the file: ELFDATA2LSB or ELFDATA2MSB. You need to set ELFDATA2LSB like this: e_ident [EI_DATA] = ELFDATA2LSB
- EI_VERSION - version of the file header, already installed for us
- EI_PAD - do not touch
e_type is a file type, it can be ET_NONE - without type, ET_REL is a moveable file, ET_EXEC is an executable file, ET_DYN is a shared object file, etc. We need to set the file type to ET_EXEC
e_machine - the architecture required for this file, for example EM_386 - for Intel architecture, for ARM we need to write EM_ARM (40) here - see ELF for the ARM Architecture
e_version - the file version, it is necessary to install it in EV_CURRENT
e_entry is the address of the entry point, not necessary for us
e_phoff - the offset in the program header file, e_shoff - the offset of the section header, do not fill
e_flags - processor-specific flags, for our architecture (Cortex-M3) must be set to 0x05000000 (ABI version 5)
e_ehsize, e_phentsize, e_phnum, e_shentsize, e_shnum - do not touch
e_shstrndx - contains the number of the section in which there is a table of rows with sections headers. Since we have no sections yet, we will install this number later.

Creating a program header

As already mentioned, the program header (Program Header Table) is a table for matching sections of a file to memory segments, which tells the loader where to write each section. Zagovok created using the function elf32_newphdr:

 Elf32_Phdr * elf32_newphdr( Elf *elf, size_t count);

elf is our descriptor
count - the number of created table elements. Since we will have only one section (with program code), the count will be equal to 1.

Returns 0 on error or a pointer to the program header.
Each element in the header table is described by the following structure:

 typedef struct { Elf32_Word p_type; Elf32_Off p_offset; Elf32_Addr p_vaddr; Elf32_Addr p_paddr; Elf32_Word p_filesz; Elf32_Word p_memsz; Elf32_Word p_flags; Elf32_Word p_align; } Elf32_Phdr;

p_type is a segment type (section), here we must indicate PT_LOAD is a loadable segment
p_offset - offsets in the file, where the data section starts, which will be loaded into memory. We have a section .text, which will be located immediately after the file header and program header, the offset we can calculate as the sum of the lengths of these headers. The length of any type can be obtained using the elf32_fsize function:
```
 size_t elf32_fsize(Elf_Type type, size_t count, unsigned int version); 
```
type - here is the ELF_T_xxx constant, we will need the sizes ELF_T_EHDR and ELF_T_PHDR; count - the number of elements of the desired type, version - must be set to EV_CURRENT
p_vaddr, p_paddr is a virtual and physical address to which the contents of the section will be loaded. Since we do not have virtual addresses, we set it equal to physical, in the simplest case - 0, because it is here that our program will be loaded.
p_filesz, p_memsz - the size of the section in the file and memory. We have them the same, but since there are no sections with program code yet, we will install them later.
p_flags - permissions for loaded memory segment. There can be PF_R - read, PF_W - write, PF_X - execution or their combination. Set p_flags to PF_R + PF_X
p_align - segment alignment, we have 4

Creating sections

After creating the headers, you can start creating sections. An empty section is created using the elf_newscn function:

 Elf_Scn * elf_newscn( Elf *elf);

elf - the handle returned earlier by the elf_begin function

The function returns a pointer to the section or 0 on error.
After creating the section, you need to fill in the section header and create a section data descriptor.
We can get a pointer to the section header using the elf32_getshdr function:

 Elf32_Shdr * elf32_getshdr( Elf_Scn *scn);

scn is a pointer to the section that we received from the elf_newscn function.

The section header looks like this:

 typedef struct { Elf32_Word sh_name; Elf32_Word sh_type; Elf32_Word sh_flags; Elf32_Addr sh_addr; Elf32_Off sh_offset; Elf32_Word sh_size; Elf32_Word sh_link; Elf32_Word sh_info; Elf32_Word sh_addralign; Elf32_Word sh_entsize; } Elf32_Shdr;

sh_name - section name - offset in the string table of section headers (.shstrtab section) - see “Row Tables” below
sh_type is the type of the section contents, for the section with the program code you need to set SHT_PROGBITS, for sections with the row table - SHT_STRTAB, for the symbol table - SHT_SYMTAB
sh_flags are section flags that can be combined, and of which we need only three:
- SHF_ALLOC - means that the section will be loaded into memory
- SHF_EXECINSTR - section contains executable code
- SHF_STRINGS - section contains a table of rows
Accordingly, for the .text section with the program you need to set the flags SHF_ALLOC + SHF_EXINSTR
sh_addr - the address where the section will be loaded into memory
sh_offset - section offset in the file - do not touch, the library will install for us
sh_size - section size - do not touch
sh_link - contains the number of the associated section, it is necessary to link the section with the corresponding line table (see below)
sh_info - additional information depending on the type of section, set to 0
sh_addralign - alignment of the address, do not touch
sh_entsize - if the section consists of several elements of the same length, indicates the length of such an element, do not touch

After filling in the header, you need to create a section data descriptor with the elf_newdata function:

 Elf_Data * elf_newdata( Elf_Scn *scn);

scn - just received a pointer to a new section.

The function returns 0 on error, or a pointer to the Elf_Data structure, which will need to be filled:

 typedef struct { void* d_buf; Elf_Type d_type; size_t d_size; off_t d_off; size_t d_align; unsigned d_version; } Elf_Data;

d_buf - pointer to the data to be written to the section
d_type is a data type, ELF_T_BYTE is suitable for us everywhere
d_size - data size
d_off - offset in the section, set to 0
d_align - alignment, can be set to 1 - without alignment
d_version - version, be sure to install in EV_CURRENT

Special sections

For our purposes, we will need to create the minimum required set of sections:

.text - section with program code
.symtab - file symbol table
.strtab is a string table containing the names of symbols from the .symtab section, since the latter do not store the names themselves, but their indices
.shstrtab - a string table containing section names

All sections are created as described in the previous section, but each special section has its own characteristics.

Section .text

This section contains the executable code, so you need to install sh_type in SHT_PROGBITS, sh_flags in SHF_EXECINSTR + SHF_ALLOC, sh_addr - set equal to the address where this code will be loaded

Section .symtab

The section contains the description of all symbols (functions) of the program and the files in which they were described. It consists of such elements with a length of 16 bytes:

 typedef struct { Elf32_Word st_name; Elf32_Addr st_value; Elf32_Word st_size; unsigned char st_info; unsigned char st_other; Elf32_Half st_shndx; } Elf32_Sym;

st_name - the name of the character (the index in the string table .strtab)
st_value - value (entry address for the function or 0 for the file). Since the Cortex-M3 has a command system of Thumb-2, this address must be odd (real address + 1)
st_size - the length of the function code (0 for the file)
st_info - the type of the symbol and its scope. There is a macro to determine the value of this field.
```
 #define ELF32_ST_INFO(b,t) (((b)<<4)+((t)&0xf)) 
```
where b is the scope and t is the type of symbol
The scope can be STB_LOCAL (the symbol is not visible from other object files) or STB_GLOBAL (visible). For simplicity, use STB_GLOBAL.
Symbol type - STT_FUNC for the function, STT_FILE for the file
st_other - set to 0
st_shndx - the index of the section for which the symbol is defined (the index of the .text section), or SHN_ABS for the file.
The section index from its scn descriptor can be determined using elf_ndxscn:
```
 size_t elf_ndxscn( Elf_Scn *scn); 
```

The data for the section can be collected when traversing the source text into an array, a pointer to which is then written to the section data descriptor (d_buf).
This section is created in the usual way, only sh_type needs to be set to SHT_SYMTAB, and the index of the .strtab section is written to the sh_link field, so these sections will become linked.

Section .strtab

In this section are the names of all the characters from the .symtab section. It is created as a regular section, but sh_type needs to be set to SHT_STRTAB, sh_flags to SHF_STRINGS, so this section becomes a string table.
The data for the section can be collected when traversing the source text into an array, a pointer to which is then written to the section data descriptor (d_buf).

.Shstrtab section

Section - a table of lines, contains the headers of all sections of the file, including its own title. It is created in the same way as the .strtab section. After creating its index, you need to write to the e_shstrndx field of the file header.

Row tables

The row tables contain consecutive rows ending in a zero byte, the first byte in this table must also be 0. The row index in the table is just the offset in bytes from the beginning of the table, so the first line of the 'name' has the index 1, the next line ' var 'has an index of 6.

  Index 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 
        \ 0 name \ 0 var \ 0

Write file

So, the headers and sections are already formed, now they need to be written to a file and complete the work with libelf. The record is produced by the function elf_update:

 off_t elf_update( Elf *elf, Elf_Cmd cmd);

elf - handle
cmd is a command, must be equal to ELF_C_WRITE for writing.

The function returns -1 on error. Error text can be obtained by calling the function elf_errmsg (-1), which will return a pointer to the string with an error.
We finish work with the library with the function elf_end, with which we pass our descriptor. It remains only to close the previously opened file.
However, our generated file does not contain debugging information, which we will add in the next section.

Create DWARF

We will create debug information using the libdwarf library, complete with which is a pdf-file with documentation (libdwarf2p.1.pdf - A Producer Library Interface to DWARF).
Creating debug information consists of the following steps:

Initial libdwarf producer
Creating nodes (DIE - Debugging Information Entry)
Creating node attributes
Creating a Compilation Unit
Creating Common Information Entry
Creating data types
Creating procedures (functions)
Creating variables and constants
Creating sections with debug information
Finishing work with the library

Consider the stages in more detail.

Initial libdwarf producer

We will create debugging information during compilation simultaneously with the creation of symbols in the .symtab section, so the library must be initialized after libelf is initialized, the ELF header and program header are created, before sections are created.
For initialization, we will use the dwarf_producer_init_c function. There are several other initialization functions in the library (dwarf_producer_init, dwarf_producer_init_b), which differ in some of the nuances described in the documentation. In principle, you can use any of them.

 Dwarf_P_Debug dwarf_producer_init_c( Dwarf_Unsigned flags, Dwarf_Callback_Func_c func, Dwarf_Handler errhand, Dwarf_Ptr errarg, void * user_data, Dwarf_Error *error)

flags - a combination of “or” several constants that determine some parameters, for example, information width, following bytes (little-endian, big-endian), relocation format, from which we definitely need DW_DLC_WRITE and DW_DLC_SYMBOLIC_RELOCATIONS
func is a callback function that will be called when creating ELF sections with debug information. See below in the section “Creating Debug Information Sections” for more details.
errhand - pointer to the function that will be called when errors occur. Can pass 0
errarg - the data that will be passed to the errhand function can be set to 0
user_data - the data that will be passed to the func function can be set to 0
error - return error code

The function returns Dwarf_P_Debug - a descriptor used in all subsequent functions, or -1 in case of an error, while the error will contain an error code (you can get the error message text by its code using the dwarf_errmsg function, passing this code to it)

Creating Nodes (DIE - Debugging Information Entry)

As described above, debugging information forms a tree structure. To create a node of this tree, you need:

create it with the dwarf_new_die function
add attributes to it (each attribute type is added by its function, which will be described later)

The node is created using the dwarf_new_die function:

 Dwarf_P_Die dwarf_new_die( Dwarf_P_Debug dbg, Dwarf_Tag new_tag, Dwarf_P_Die parent, Dwarf_P_Die child, Dwarf_P_Die left_sibling, Dwarf_P_Die right_sibling, Dwarf_Error *error)

dbg - Dwarf_P_Debug handle obtained during library initialization
new_tag - node tag (type) - constant DW_TAG_xxxx, which can be found in the file libdwarf.h
parent, child, left_sibling, right_sibling - parent, descendant, left and right neighbors of the node, respectively. It is not necessary to specify all these parameters, it is enough to specify one, put 0 instead of the others. If all parameters are 0, the node will be either root or isolated
error - will contain an error code when it occurs

The function returns the DW_DLV_BADADDR on error or the handle of the Dwarf_P_Die node if successful.

Creating node attributes

To create node attributes there is a whole family of functions dwarf_add_AT_xxxx. Sometimes it’s problematic to determine which function needs to create the necessary attribute, so I’ve even dug into the library source code several times. Some of the features will be described here, some below in the relevant sections. All of them accept the ownerdie parameter — the handle of the node to which the attribute will be added, and return an error code in the error parameter.
The dwarf_add_AT_name function adds a “name” attribute (DW_AT_name) to the node. Most nodes should have a name (for example, procedures, variables, constants), some names may not be (for example, the Compilation Unit)

 Dwarf_P_Attribute dwarf_add_AT_name( Dwarf_P_Die ownerdie, char *name, Dwarf_Error *error)

name - attribute value itself (node name)

Returns DW_DLV_BADADDR on error or attribute handle on success.
The functions dwarf_add_AT_signed_const, dwarf_add_AT_unsigned_const add to the node the specified attribute and its signed (unsigned) value. Character and unsigned attributes are used to set constant values, sizes, line numbers, etc. Format of functions:

 Dwarf_P_Attribute dwarf_add_AT_(un)signed_const( Dwarf_P_Debug dbg, Dwarf_P_Die ownerdie, Dwarf_Half attr, Dwarf_Signed value, Dwarf_Error *error)

dbg - Dwarf_P_Debug handle obtained during library initialization
attr - the attribute whose value is set, is the constant DW_AT_xxxx, which can be found in the file libdwarf.h
value - attribute value

Return DW_DLV_BADADDR in the event of an error or attribute handle on successful completion.

Creating a Compilation Unit

There must be a root in any tree - we have a compilation unit that contains information about the program (for example, the name of the main file, the programming language used, the name of the compiler, the sensitivity of characters (variables, functions) to the register, the main function of the program, the starting address and. etc.) In principle, no attributes are required. For example, create information about the main file and compiler.

Main file information

To store information about the main file, the “name” attribute (DW_AT_name) is used, use the dwarf_add_AT_name function, as shown in the “Creating node attributes” section.

Compiler info

Use the dwarf_add_AT_producer function:

 Dwarf_P_Attribute dwarf_add_AT_name( Dwarf_P_Die ownerdie, char *producer_string, Dwarf_Error *error)

producer_string - a string with text information

Returns DW_DLV_BADADDR on error or attribute handle on success.

Creating Common Information Entry

Usually, when a function (subroutine) is called, its parameters and the return address are pushed onto the stack (although each compiler can do it in its own way), all this is called a call frame. The debugger needs information about the frame format in order to correctly determine the return address from a function and build a backtrace - a chain of function calls that led us to the current function, and the parameters of these functions. Also usually indicated processor registers, which are stored on the stack. The code that reserves space on the stack and keeps the registers of the processor is called the function prologue, the code restoring registers and the stack is called the epilogue.
This information is highly dependent on the compiler. For example, the prologue and epilogue need not necessarily be at the very beginning and end of a function; sometimes the frame is used, sometimes not; processor registers can be stored in other registers, etc.
So, the debugger needs to know how the processor registers change their value and where they will be saved when entering the procedure. This information is called Call Frame Information - information about the format of the frame. For each address in the program (containing the code), the frame's memory address (Canonical Frame Address - CFA) and information about processor registers are indicated, for example, you can specify that:

the register is not stored in the procedure
the register does not change its value in the procedure
the register is stored on the stack at the address CFA + n
the register is saved in another register
the register is stored in memory at some address, which can be calculated in a rather non-obvious way
etc.

Since the information must be indicated for each address in the code, it is very voluminous and is stored in a compressed form in the .debug_frame section. Since it changes little from address to address, only its changes are encoded in the form of instructions DW_CFA_xxxx. Each instruction points to one change, for example:

DW_CFA_set_loc - indicates the current address in the program
DW_CFA_advance_loc - advances the pointer to a certain number of bytes
DW_CFA_def_cfa - specifies the address of the stack frame (numeric constant)
DW_CFA_def_cfa_register - specifies the address of the stack frame (taken from the processor register)
DW_CFA_def_cfa_expression - specifies how to calculate the address of the stack frame
DW_CFA_same_value - indicates that the register does not change
DW_CFA_register - indicate that the register is stored in another register
etc.

Elements of the .debug_frame section are two types of records: Common Information Entry (CIE) and Frame Description Entry (FDE). The CIE contains information that is common to many FDE records, roughly speaking it describes a certain type of procedure. FDE also describes each specific procedure. When entering the procedure, the debugger first executes instructions from the CIE, and then from the FDE.
My compiler creates procedures in which the CFA is in the sp (r13) register. Create a CIE for all procedures. For this there is a function dwarf_add_frame_cie:

 Dwarf_Unsigned dwarf_add_frame_cie( Dwarf_P_Debug dbg, char *augmenter, Dwarf_Small code_align, Dwarf_Small data_align, Dwarf_Small ret_addr_reg, Dwarf_Ptr init_bytes, Dwarf_Unsigned init_bytes_len, Dwarf_Error *error);

augmenter is a UTF-8 encoded string, the presence of which indicates that there is additional platform-specific information to the CIE or FDE. Put an empty string
code_align - code alignment in bytes (we have 2)
data_align — ( -4, 4 )
ret_addr_reg — , ( 14)
init_bytes — , DW_CFA_. , . elf-, , . 3 : 0x0C, 0x0D, 0, DW_CFA_def_cfa: r13 ofs 0 (CFA r13, 0)
init_bytes_len — init_bytes

DW_DLV_NOCOUNT CIE, FDE , « FDE »

, , , . , ( int, double .), .
— DW_TAG_base_type. :

«» (DW_AT_name)
«» (DW_AT_encoding) — , (, DW_ATE_boolean — , DW_ATE_float — , DW_ATE_signed — , DW_ATE_unsigned — .)
«» (DW_AT_byte_size — DW_AT_bit_size — )

.
, 32- «int», DW_TAG_base_type DW_AT_name — «int», DW_AT_encoding — DW_ATE_signed, DW_AT_byte_size — 4.
. DW_AT_type — . int — DW_TAG_pointer_type DW_AT_type «int».
dwarf_add_AT_reference:

 Dwarf_P_Attribute dwarf_add_AT_reference( Dwarf_P_Debug dbg, Dwarf_P_Die ownerdie, Dwarf_Half attr, Dwarf_P_Die otherdie, Dwarf_Error *error)

attr — , DW_AT_type
otherdie — ,

To create procedures, I need to clarify one more type of debugging information - information about line numbers (Line Number Information). It serves to compare each machine instruction to a specific line of source code, as well as to enable the program to debug the program in-line. This information is stored in the .debug_line section. If we had enough space, then it would be stored as a matrix, one line for each instruction with such columns:

source file name
line number in this file
column number in file
is the instruction the beginning of the statement or statement block
etc.

, . -, , -, , . , , «» . , : DW_LNS_advance_pc — , DW_LNS_set_file — , , DW_LNS_const_add_pc — ..
, libdwarf , .
It is expensive to store the file name for each instruction, so instead of the name its index is stored in a special table. To create a file index, use the dwarf_add_file_decl function:

 Dwarf_Unsigned dwarf_add_file_decl( Dwarf_P_Debug dbg, char *name, Dwarf_Unsigned dir_idx, Dwarf_Unsigned time_mod, Dwarf_Unsigned length, Dwarf_Error *error)

name - file name
dir_idx - index of the folder where the file is located. The index can be obtained using the dwarf_add_directory_decl function. If full paths are used, you can put 0 as the folder index and not use dwarf_add_directory_decl at all
time_mod - file modification time, can be omitted (0)
length - file size, also optional (0)

DW_DLV_NOCOUNT .
dwarf_add_line_entry_b, dwarf_lne_set_address, dwarf_lne_end_sequence, .
:

.symtab
FDE

« .symtab». . , . , , .

dwarf_new_die (. « »), DW_TAG_subprogram, — Compilation Unit ( ) DIE ( ). :

( dwarf_add_AT_name, . « »)
, ( DW_AT_decl_line), dwarf_add_AT_unsigned_const (. « »)
( DW_AT_decl_file), dwarf_add_AT_unsigned_const (. « »)
( DW_AT_low_pc), dwarf_add_AT_targ_address, .
( DW_AT_high_pc), dwarf_add_AT_targ_address, .
( DW_AT_type — , . « »). —

DW_AT_low_pc DW_AT_high_pc dwarf_add_AT_targ_address_b:

 Dwarf_P_Attribute dwarf_add_AT_targ_address_b( Dwarf_P_Debug dbg, Dwarf_P_Die ownerdie, Dwarf_Half attr, Dwarf_Unsigned pc_value, Dwarf_Unsigned sym_index, Dwarf_Error *error)

attr — (DW_AT_low_pc DW_AT_high_pc)
pc_value —
sym_index - the index of the procedure symbol in the .symtab table. Optional, you can pass 0

The function will return DW_DLV_BADADDR on error.

Creating an FDE procedure

As mentioned above in the “Creating a Common Information Entry” section, for each procedure you need to create a frame descriptor, which happens in several steps:

creating a new FDE (see Creating a Common Information Entry)
joining the created FDE to the common list
adding instructions to the generated fde

You can create a new FDE with the dwarf_new_fde function:

 Dwarf_P_Fde dwarf_new_fde( Dwarf_P_Debug dbg, Dwarf_Error *error)

The function will return a handle to a new FDE or DW_DLV_BADADDR on error.
You can attach a new FDE to the list using dwarf_add_frame_fde:

 Dwarf_Unsigned dwarf_add_frame_fde( Dwarf_P_Debug dbg, Dwarf_P_Fde fde, Dwarf_P_Die die, Dwarf_Unsigned cie, Dwarf_Addr virt_addr, Dwarf_Unsigned code_len, Dwarf_Unsigned sym_idx, Dwarf_Error* error)

fde - just received handle
die - DIE procedure (see Creating a procedure node with attributes)
cie - the CIE descriptor (see Creating the Common Information Entry)
virt_addr - the starting address of our procedure
code_len - procedure length in bytes
sym_idx - character index (optional, you can specify 0)

The function will return DW_DLV_NOCOUNT on error.
After all this, you can add DW_CFA_xxxx instructions to our FDE. This is done by the dwarf_add_fde_inst and dwarf_fde_cfa_offset functions. The first one adds the specified instruction to the list:

 Dwarf_P_Fde dwarf_add_fde_inst( Dwarf_P_Fde fde, Dwarf_Small op, Dwarf_Unsigned val1, Dwarf_Unsigned val2, Dwarf_Error *error)

fde - the descriptor of the created FDE
op - instruction code (DW_CFA_xxxx)
val1, val2 - instruction parameters (different for each instruction, see Standard, section 6.4.2 Call Frame Instructions)

The dwarf_fde_cfa_offset function adds the DW_CFA_offset statement:

 Dwarf_P_Fde dwarf_fde_cfa_offset( Dwarf_P_Fde fde, Dwarf_Unsigned reg, Dwarf_Signed offset, Dwarf_Error *error)

fde - the descriptor of the created FDE
reg - the register that is written to the frame
offset - its offset in the frame (not in bytes, but in the frame elements, see Creating Common Information Entry, data_align)

, , lr (r14). DW_CFA_advance_loc , 1, pc 2 (. Common Information Entry, code_align), DW_CFA_def_cfa_offset 4 ( 4 ) dwarf_fde_cfa_offset reg=14 offset=1, r14 -4 CFA.

, . « »

C

dwarf_lne_set_address
for each line of code (or machine instruction) we create information about the source code (dwarf_add_line_entry)
at the end of the procedure, we complete the instruction block with the dwarf_lne_end_sequence function

The dwarf_lne_set_address function sets the address where the block of instructions begins:

 Dwarf_Unsigned dwarf_lne_set_address( Dwarf_P_Debug dbg, Dwarf_Addr offs, Dwarf_Unsigned symidx, Dwarf_Error *error)

offs - the address of the procedure (the address of the first machine instruction)
sym_idx - character index (optional, you can specify 0)

Returns 0 (success) or DW_DLV_NOCOUNT (error).
The dwarf_add_line_entry_b function adds source line information to the .debug_line section. I call this function for each machine instruction:

 Dwarf_Unsigned dwarf_add_line_entry_b( Dwarf_P_Debug dbg, Dwarf_Unsigned file_index, Dwarf_Addr code_offset, Dwarf_Unsigned lineno, Dwarf_Signed column_number, Dwarf_Bool is_source_stmt_begin, Dwarf_Bool is_basic_block_begin, Dwarf_Bool is_epilogue_begin, Dwarf_Bool is_prologue_end, Dwarf_Unsigned isa, Dwarf_Unsigned discriminator, Dwarf_Error *error)

file_index - the source code file index obtained earlier by the dwarf_add_file_decl function (see. "Creating Procedures")
code_offset - the address of the current machine instruction
lineno - line number in source file
column_number —
is_source_stmt_begin — 1 lineno ( 1)
is_basic_block_begin — 1 ( 0)
is_epilogue_begin — 1 ( , 0)
is_prologue_end — 1 (!)
isa — instruction set architecture ( ). DW_ISA_ARM_thumb ARM Cortex M3!
discriminator. (, , ) . . , 0

0 () DW_DLV_NOCOUNT ().
, dwarf_lne_end_sequence :

 Dwarf_Unsigned dwarf_lne_end_sequence( Dwarf_P_Debug dbg, Dwarf_Addr address; Dwarf_Error *error)

address —

0 () DW_DLV_NOCOUNT ().
.

, . , ( ), . — , — ( , ). , , .
, , . , — (location expressions). — ( DW_OP_) - , , . , :

DW_OP_addr —
DW_OP_fbreg — ( )
DW_OP_reg0… DW_OP_reg31 — ,

, (dwarf_new_expr), (dwarf_add_expr_addr, dwarf_add_expr_gen .) DW_AT_location (dwarf_add_AT_location_expression).
0 :

 Dwarf_Expr dwarf_new_expr( Dwarf_P_Debug dbg, Dwarf_Error *error)

dwarf_add_expr_gen:

 Dwarf_Unsigned dwarf_add_expr_gen( Dwarf_P_Expr expr, Dwarf_Small opcode, Dwarf_Unsigned val1, Dwarf_Unsigned val2, Dwarf_Error *error)

expr — ,
opcode — , DW_OP_
val1, val2 — (. )

The function returns DW_DLV_NOCOUNT on error.
To explicitly set the address of a variable, the dwarf_add_expr_addr function should be used instead of the previous one:

 Dwarf_Unsigned dwarf_add_expr_addr( Dwarf_P_Expr expr, Dwarf_Unsigned address, Dwarf_Signed sym_index, Dwarf_Error *error)

expr - the address expression handle to which the instruction is added
address - the address of the variable
sym_index - the index of the character in the .symtab table. Optional, you can pass 0

The function also returns DW_DLV_NOCOUNT on error.
Finally, add the created address expression to the node by using the dwarf_add_AT_location_expr function:

 Dwarf_P_Attribute dwarf_add_AT_location_expr( Dwarf_P_Debug dbg, Dwarf_P_Die ownerdie, Dwarf_Half attr, Dwarf_P_Expr loc_expr, Dwarf_Error *error)

ownerdie - the node to which the expression is added
attr - attribute (in our case DW_AT_location)
loc_expr - a handle to a previously created address expression

DW_DLV_NOCOUNT .
( ) — DW_TAG_variable, DW_TAG_formal_parameter DW_TAG_const_type . :

/ ( dwarf_add_AT_name, . « »)
, ( DW_AT_decl_line), dwarf_add_AT_unsigned_const (. « »)
( DW_AT_decl_file), dwarf_add_AT_unsigned_const (. « »)
/ ( DW_AT_type — , . « »)
(. ) —
— ( DW_AT_const_value, . « »)

elf- . :

dwarf_transform_to_disk_form, elf-
dwarf_get_section_bytes ,

 dwarf_transform_to_disk_form ( Dwarf_P_Debug dbg, Dwarf_Error* error)

translates the debug information we created into binary format, but does not write anything to disk. It will return us the number of elf sections created or DW_DLV_NOCOUNT on error. At the same time, for each section, a callback function will be called, which we passed during library initialization to the dwarf_producer_init_c function. This function must be written by us. Its specification is as follows:

 typedef int (*Dwarf_Callback_Func_c)( char* name, int size, Dwarf_Unsigned type, Dwarf_Unsigned flags, Dwarf_Unsigned link, Dwarf_Unsigned info, Dwarf_Unsigned* sect_name_index, void * user_data, int* error)

name - the name of the elf section to create
size - section size
type - section type
flags - section flags
link - section link field
info - section information field
sect_name_index - you need to return the index of the section with relocation (optional)
user_data - transmitted to us in the same way we set it in the library initialization function
error - here you can pass the error code

In this function, we must:

create a new section (elf_newscn function, see Creating sections)
( elf32_getshdr, )
(. ). , . sh_addr, sh_offset, sh_entsize 0, sh_addralign 1
( elf_ndxscn, . « .symtab») -1 ( error )
".rel" ( ), 0

dwarf_transform_to_disk_form . 0 , :

dwarf_get_section_bytes:

 Dwarf_Ptr dwarf_get_section_bytes( Dwarf_P_Debug dbg, Dwarf_Signed dwarf_section, Dwarf_Signed *elf_section_index, Dwarf_Unsigned *length, Dwarf_Error* error)

dwarf_section — . 0..n, n — , dwarf_transform_to_disk_form
elf_section_index — ,
length —
error —

0 ( ,
)

( elf_newdata, . ) (. ), :
- d_buf — ,
- d_size — ( )

libdwarf dwarf_producer_finish:

 Dwarf_Unsigned dwarf_producer_finish( Dwarf_P_Debug dbg, Dwarf_Error* error)

The function returns DW_DLV_NOCOUNT on error.
I note that writing to disk at this stage is not performed. Recording needs to be done through the functions in the “Create ELF - Write File” section.

Conclusion

That's all.
I repeat, the creation of debug information is a very extensive topic, and I did not touch on many of those, just opening the curtain. Those who wish can go to infinity.
If you have questions - I will try to answer them.