# also available via MacPorts - `sudo port install qemu` brew install qemu qemu comes with several ready-to-use machines (see the qemu-system-riscv32 -machine ).~/usys/riscv . Memorize it for future reference. mkdir -p ~/usys/riscv cd ~/Downloads cp openocd-<date>-<platform>.tar.gz ~/usys/riscv cp riscv64-unknown-elf-gcc-<date>-<platform>.tar.gz ~/usys/riscv cd ~/usys/riscv tar -xvf openocd-<date>-<platform>.tar.gz tar -xvf riscv64-unknown-elf-gcc-<date>-<platform>.tar.gz RISCV_OPENOCD_PATH and RISCV_PATH environment variables so that other programs can find our tool chain. It may look different depending on the OS and the shell: I added paths to the ~/.zshenv . # I put these two exports directly in my ~/.zshenv file - you may have to do something else. export RISCV_OPENOCD_PATH="$HOME/usys/riscv/openocd-<date>-<version>" export RISCV_PATH="$HOME/usys/riscv/riscv64-unknown-elf-gcc-<date>-<version>" # Reload .zshenv with our new environment variables. Restarting your shell will have a similar effect. source ~/.zshenv /usr/local/bin to run it at any time without specifying the full path to ~/usys/riscv/riscv64-unknown-elf-gcc-<date>-<version>/bin/riscv64-unknown-elf-gcc . # Symbolically link our gcc executable into /usr/local/bin. Repeat this process for any other executables you want to quickly access. ln -s ~/usys/riscv/riscv64-unknown-elf-gcc-8.2.0-<date>-<version>/bin/riscv64-unknown-elf-gcc /usr/local/bin riscv64-unknown-elf-gcc , riscv64-unknown-elf-gdb , riscv64-unknown-elf-ld and others, are located in ~/usys/riscv/riscv64-unknown-elf-gcc-<date>-<version>/bin/ . cd ~/wherever/you/want/to/clone/this git clone --recursive https://github.com/sifive/freedom-e-sdk.git cd freedom-e-sdk freedom-e-sdk repository. We use a ready-made Makefile , which they provide for compiling this program in debug mode: make PROGRAM=hello TARGET=sifive-hifive1 CONFIGURATION=debug software qemu-system-riscv32 -nographic -machine sifive_e -kernel software/hello/debug/hello.elf Hello, World! freedom-e-sdk . After that, we will write and try to debug our own C program. cat add.c int main() { int a = 4; int b = 12; while (1) { int c = a + b; } return 0; } # -O0 to disable all optimizations. Without this, GCC might optimize # away our infinite addition since the result 'c' is never used. # -g to tell GCC to preserve debug info in our executable. riscv64-unknown-elf-gcc add.c -O0 -g a.out file is created, the default gcc name for executables. Now run this file in qemu : # -machine tells QEMU which among our list of available machines we want to # run our executable against. Run qemu-system-riscv64 -machine help to list # all available machines. # -m is the amount of memory to allocate to our virtual machine. # -gdb tcp::1234 tells QEMU to also start a GDB server on localhost:1234 where # TCP is the means of communication. # -kernel tells QEMU what we're looking to run, even if our executable isn't # exactly a "kernel". qemu-system-riscv64 -machine virt -m 128M -gdb tcp::1234 -kernel a.out virt machine that riscv-qemu comes riscv-qemu .localhost:1234 , connect to it with a RISC-V GDB client from a separate terminal: # --tui gives us a (t)extual (ui) for our GDB session. # While we can start GDB without any arguments, specifying 'a.out' tells GDB # to load debug symbols from that file for the newly created session. riscv64-unknown-elf-gdb --tui a.out This GDB was configured as "--host = x86_64-apple-darwin17.7.0 --target = riscv64-unknown-elf". │
Type "show configuration" for configuration details. │
For bug reporting instructions, please see: │
<http://www.gnu.org/software/gdb/bugs/>. │
Find the GDB manual and other documentation online at: │
<http://www.gnu.org/software/gdb/documentation/>. │
│
For help, type "help". │
Type "apropos word" ... │
Reading symbols from a.out ... │
(gdb) run or start commands for the a.out executable file in GDB, but at the moment it will not work for an obvious reason. We compiled the program as riscv64-unknown-elf-gcc , so the host should work on the riscv64 architecture.riscv64-unknown-elf-gdb and instead of launching it on the host, point it to some remote target (GDB server). As you remember, we just ran riscv-qemu and told to start the GDB server on localhost:1234 . Just connect to this server:(gdb) target remote: 1234 Remote debugging using: 1234
(gdb) b main Breakpoint 1 at 0x1018e: file add.c, line 2. (gdb) b 5 # this is the line within the forever-while loop. int c = a + b; Breakpoint 2 at 0x1019a: file add.c, line 5. continue (abbreviated c command) until we reach a breakpoint: (gdb) c Continuing. b 5 ? What happened?
L?? ) and displays the 0x0 counter ( PC: 0x0 ).0x0000000000000000 in ?? () 0x0000000000000000 in ?? ()main function performs simple addition, but what is it really? Why should it be called main , not origin or begin ? According to the convention, all executable files start to run from the main function, but what magic does this behavior provide?-v flag to get a more detailed output of what is actually happening. riscv64-unknown-elf-gcc add.c -O0 -g -v -c flag). Why is it important? Well, take a look at a fragment from the gcc detailed output:# The actual `gcc -v` command outputs # long, so pretend these variables exist. # $ RV_GCC_BIN_PATH = / Users / twilcock / usys / riscv / riscv64-unknown-elf-gcc- <date> - <version> / bin / # $ RV_GCC_LIB_PATH = $ RV_GCC_BIN_PATH /../lib / gcc / riscv64-unknown-elf / 8.2.0 $ RV_GCC_BIN_PATH /../ libexec / gcc / riscv64-unknown-elf / 8.2.0 / collect2 \ ... truncated ... $ RV_GCC_LIB_PATH /../../../../ riscv64-unknown-elf / lib / rv64imafdc / lp64d / crt0.o \ $ RV_GCC_LIB_PATH / riscv64-unknown-elf / 8.2.0 / rv64imafdc / lp64d / crtbegin.o \ -lgcc - start-group -lc -lgloss - end-group-lgcc \ $ RV_GCC_LIB_PATH / rv64imafdc / lp64d / crtend.o ... truncated ... COLLECT_GCC_OPTIONS = '- O0' '-g' '-v' '-march = rv64imafdc' '-mabi = lp64d'
gcc runs the collect2 program, collect2 in the arguments crt0.o , crtbegin.o and crtend.o , the flags -lgcc and --start-group . Description of collect2 can be read here : briefly, collect2 organizes various initialization functions at startup, making the layout in one or more passes.crt files with our code. As you can guess, crt means 'C runtime'. It describes in detail what each crt , but we are interested in crt0 , which does one important thing:"It is expected that this object [crt0] contains the symbol _start , which indicates the initial load of the program."main . Yes, finally we have found the answer to the question: it is _start calls our main function!gdb ? It remains to solve several problems: the first one is related to the way crt0 sets up our stack.gcc crt0 by default. The default parameters are selected based on several factors:machine-vendor-operatingsystem . We have this riscv64-unknown-elfrv64imafdclp64dcrt0 is to configure the stack. But he does not know where exactly should the stack be for our CPU ( -machine )? He can not cope without our help.qemu-system-riscv64 -machine virt -m 128M -gdb tcp::1234 -kernel a.out we used the virt machine. Fortunately, qemu makes it easy to reset machine information to a dtb dump (device tree blob). # Go to the ~/usys/riscv folder we created before and create a new dir # for our machine information. cd ~/usys/riscv && mkdir machines cd machines # Use qemu to dump info about the 'virt' machine in dtb (device tree blob) # format. # The data in this file represents hardware components of a given # machine / device / board. qemu-system-riscv64 -machine virt -machine dumpdtb=riscv64-virt.dtb dtc command line utility (device tree compiler) that can convert the file to something more readable. # I'm running MacOS, so I use Homebrew to install this. If you're # running another OS you may need to do something else. brew install dtc # Convert our .dtb into a human-readable .dts (device tree source) file. dtc -I dtb -O dts -o riscv64-virt.dts riscv64-virt.dtb riscv64-virt.dts , where we see a lot of interesting information about virt : the number of available processor cores, the memory location of various peripheral devices, such as the UART, the location of the built-in memory (RAM). The stack should be in this memory, so look for it using grep : grep memory riscv64-virt.dts -A 3 memory@80000000 { device_type = "memory"; reg = <0x00 0x80000000 0x00 0x8000000>; }; device_type . Apparently, we found what we were looking for. According to the values ​​inside reg = <...> ; You can determine where the memory bank starts and what is its length.reg is an arbitrary number of pairs (base_address, length) . However, reg four values. Strange, is it not enough for one memory bank two values?reg property), we learn that the number of <u32> cells to specify the address and length is determined by the #address-cells and #size-cells properties in the parent node (or in the node itself). These values ​​are not indicated in our memory node, and the parent memory node is simply the root of the file. Let's look for these values ​​in it: head -n8 riscv64-virt.dts /dts-v1/; / { #address-cells = <0x02>; #size-cells = <0x02>; compatible = "riscv-virtio"; model = "riscv-virtio,qemu"; reg = <0x00 0x80000000 0x00 0x8000000>; our memory starts 0x00 + 0x80000000 (0x80000000) and occupies 0x00 + 0x8000000 (0x8000000) bytes, that is, ends at 0x88000000 , which corresponds to 128 megabytes.qemu and dtc we found the RAM addresses in the virt virtual machine. We also know that gcc crt0 by default without setting up the stack as we need. But how to use this information to eventually run and debug the program?crt0 does not suit us, there is one obvious option: to write your own code, and then link it with an object file, which turned out after compiling our simple program. Our crt0 needs to know where the top of the stack begins in order to properly initialize it. We could hard-code the value 0x80000000 directly to crt0 , but this is not a very suitable solution considering the changes that may be needed in the future. What if we want to use another CPU in the emulator, such as sifive_e , with different characteristics?ld linker allows you to define a symbol that is accessible from our crt0 . We can define a __stack_top symbol suitable for different processors.ld and slightly modify it to support additional characters. What is a linker script? Here is a good description :The main purpose of the linker script is to describe how the sections of the files in the input and output are mapped, and to manage the layout of the memory of the output file.
riscv64-unknown-elf-ld linker script to a new file: cd ~/usys/riscv # Make a new dir for custom linker scripts out RISC-V CPUs may require. mkdir ld && cd ld # Copy the default linker script into riscv64-virt.ld riscv64-unknown-elf-ld --verbose > riscv64-virt.ld --Verbose key includes information about the ld version, supported architectures, and more. This is all good to know, but in the linker script this syntax is not allowed, so open the text editor and remove everything unnecessary from the file. vim riscv64-virt.ld
# Remove everything above and including the ============= line
GNU ld (GNU Binutils) 2.32
Supported emulations:
elf64lriscv
elf32lriscv
using internal linker script:
=================================================
/ * Script for -z combreloc: combine and sort reloc sections * /
/ * Copyright (C) 2014-2019 Free Software Foundation, Inc.
Without modification,
are not allowed
notice this notice are preserved. * /
OUTPUT_FORMAT ("elf64-littleriscv", "elf64-littleriscv",
"elf64-littleriscv")
... rest of the linker script ... __stack_top will be. Find the line that starts with OUTPUT_ARCH(riscv) , it should be at the top of the file, and add the MEMORY command below it: OUTPUT_ARCH(riscv) /* >>> Our addition. <<< */ MEMORY { /* qemu-system-risc64 virt machine */ RAM (rwx) : ORIGIN = 0x80000000, LENGTH = 128M } /* >>> End of our addition. <<< */ ENTRY(_start) RAM , for which read ( r ), write ( w ), and storage of executable code ( x ) are valid.virt RISC-V machine. Now you can use it. We want to put our stack in memory.__stack_top . Open your linker script ( riscv64-virt.ld ) in a text editor and add a few lines: SECTIONS { /* Read-only sections, merged into text segment: */ PROVIDE (__executable_start = SEGMENT_START("text-segment", 0x10000)); . = SEGMENT_START("text-segment", 0x10000) + SIZEOF_HEADERS; /* >>> Our addition. <<< */ PROVIDE(__stack_top = ORIGIN(RAM) + LENGTH(RAM)); /* >>> End of our addition. <<< */ .interp : { *(.interp) } .note.gnu.build-id : { *(.note.gnu.build-id) } __stack_top using the PROVIDE command . The symbol will be accessible from any program associated with this script (assuming that the program itself does not detect something with the name __stack_top ). Set the __stack_top value to ORIGIN(RAM) . We know that this value is 0x80000000 plus LENGTH(RAM) , which is 128 megabytes ( 0x8000000 bytes). This means that our __stack_top set to 0x88000000 .crt0.s file: .section .init, "ax" .global _start _start: .cfi_startproc .cfi_undefined ra .option push .option norelax la gp, __global_pointer$ .option pop la sp, __stack_top add s0, sp, zero jal zero, main .cfi_endproc .end as assembler. Lines with a dot are called assembler directives : they provide information for the assembler. This is not executable code, like RISC-V assembler instructions such as jal and add . .section .init, "ax" .init section, which is selectable ( a ) and executable ( x ). This section is another common convention for running code within the operating system. We work on pure hardware without an OS, so in our case such instruction may not be completely necessary, but in any case it is a good practice. .global _start _start: .global makes the next character available to ld . Without this, the layout will not work, because the ENTRY(_start) in the linker script points to the _start symbol as an entry point to the executable file. The next line tells the assembler that we are starting the definition of the _start character. _start: .cfi_startproc .cfi_undefined ra ...other stuff... .cfi_endproc .cfi directives inform you about the frame structure and how to handle it. The .cfi_startproc and .cfi_endproc signal the beginning and end of a function, and .cfi_undefined ra informs the assembler that the ra register should not be restored to any value contained in it before starting _start . .option push .option norelax la gp, __global_pointer$ .option pop .option directives change the behavior of the assembler in accordance with the code when you need to apply a certain set of options. Here is a detailed description of why the use of .option in this segment is important:... since we, when possible, weaken (relax) the addressing of sequences to shorter sequences relative to the GP, the initial loading of the GP should not be weakened and should be given something like this:.option push .option norelax la gp, __global_pointer$ .option pop
so that after relaxation we get the following code:auipc gp, %pcrel_hi(__global_pointer$) addi gp, gp, %pcrel_lo(__global_pointer$)
instead of simple:addi gp, gp, 0
crt0.s : _start: ...other stuff... la sp, __stack_top add s0, sp, zero jal zero, main .cfi_endproc .end __stack_top , on the creation of which we have worked so much. The la pseudoinstruction (load address) loads the __stack_top value into the sp register (stack pointer), setting it for use in the rest of the program.add s0, sp, zero adds the values ​​of the sp and zero registers (which is actually a x0 register with a hard reference to 0) and places the result in the s0 register. This is a special register that is unusual in several respects. First, it is a “save register”, that is, it is saved when function calls. Secondly, s0 sometimes acts as a frame pointer, which gives each function call a small stack space for storing parameters passed to this function. How function calls work with the stack and frame pointers is a very interesting topic that you can easily devote to a separate article, but for now just know that in our runtime environment it is important to initialize the s0 frame pointer.jal zero, main . Here jal means jump and link. The instruction expects operands in the form jal rd (destination register), offset_address . Functionally, jal writes the value of the next instruction (the register pc plus four) to rd , and then sets the register pc to the current value pc plus the address of the offset with the extension of the character , effectively “calling” this address.x0 strictly bound to the literal value 0, and writing to it is useless.Therefore, it may seem strange that we use a register as the destination register zero, which RISC-V assemblers interpret as a register x0. After all, this means an unconditional transition to offset_address. Why do so, because in other architectures there is an explicit unconditional branch instruction?jal zero, offset_addressis actually a smart optimization. Support for each new instruction means an increase and, therefore, an increase in the cost of the processor. Therefore, the simpler the ISA, the better. Instead of polluting the instruction space with two instructions jaland unconditional jump, the RISC-V architecture only supports jaland unconditional transitions are supported through jal zero, main.j offset_addressRISC-V assemblers translate the pseudoinstruction of unconditional jump to jal zero, offset_address. For a complete list of officially supported pseudoinstructions, see the RISC-V specification (version 2.2) . _start: ...other stuff... jal zero, main .cfi_endproc .end .endthat simply marks the end of the file.qemuand dtcfound our memory in the virtual machine virtRISC-V. Then we used this information to manually control the memory allocation in our version of the default linker script riscv64-unknown-elf-ld, which allowed us to determine the exact character __stack_top. Then we used this symbol in our own version crt0.s, which sets up our stack and global pointers, and finally called the function main. Now you can reach your goal and start debugging our simple program in GDB. cat add.c int main() { int a = 4; int b = 12; while (1) { int c = a + b; } return 0; } riscv64-unknown-elf-gcc -g -ffreestanding -O0 -Wl,--gc-sections -nostartfiles -nostdlib -nodefaultlibs -Wl,-T,riscv64-virt.ld crt0.s add.c -ffreestanding informs the compiler that the standard library may not exist , so no need to make assumptions about its mandatory presence. This parameter is not required when running the application on your host (in the operating system), but in this case it is not so, therefore it is important to inform the compiler this information.-Wl- A comma separated list of flags to pass to the linker ( ld). Here it --gc-sectionsmeans “garbage collection sections”, and ldis instructed to remove unused sections after layout. Flags -nostartfiles, -nostdliband -nodefaultlibstell the linker not to handle standard system startup files (for example, defaultcrt0), standard system stdlib implementations and standard system default link libraries. We have our own script crt0and linker, so it is important to pass these flags so that the default values ​​do not conflict with our custom settings.-Tindicates the path to our linker script, which in our case is simple riscv64-virt.ld. Finally, we specify the files that we want to compile, build, and link: crt0.sand add.c. As before, the result is a complete and ready-to-run file called a.out.qemu: # -S freezes execution of our executable (-kernel) until we explicitly tell # it to start with a 'continue' or 'c' from our gdb client qemu-system-riscv64 -machine virt -m 128M -gdb tcp::1234 -S -kernel a.out gdb, do not forget to load the debug symbols for a.out, specifying it with the last argument: riscv64-unknown-elf-gdb --tui a.out GNU gdb (GDB) 8.2.90.20190228-git Copyright (C) 2019 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html> This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "--host=x86_64-apple-darwin17.7.0 --target=riscv64-unknown-elf". Type "show configuration" for configuration details. For bug reporting instructions, please see: <http://www.gnu.org/software/gdb/bugs/>. Find the GDB manual and other documentation resources online at: <http://www.gnu.org/software/gdb/documentation/>. For help, type "help". Type "apropos word" to search for commands related to "word"... Reading symbols from a.out... (gdb) gdbto the server gdb, which we launched as part of the command qemu: (gdb) target remote :1234 │ Remote debugging using :1234 (gdb) b main Breakpoint 1 at 0x8000001e: file add.c, line 2. (gdb) c Continuing. Breakpoint 1, main () at add.c:2 L, the value PC:is equal L2, and PC:- 0x8000001e. If you did everything as in the article, the output will be something like this:
gdbas usual: -sto go to the next instruction, info all-registersto check the values ​​inside the registers as the program runs, etc. Experiment at your leisure ... of course we A lot of work for this!jal, so maybe in the next article we take the knowledge gained here, but replace it with add.csome program in pure RISC-V assembler. If you have something specific that you would like to see or any questions, open tickets .Source: https://habr.com/ru/post/454208/
All Articles