The study of the processor and its functional simulation

Probably every ASM / C / C ++ programmer once thought about writing his own operating system.

And probably every Verilog / VHDL developer for FPGAs ever thought about creating their own processor.
')
Actually to implement a more or less traditional processor today seems not very big problem. The principles of operation of processors are described in many books and articles. In addition, there are many processors with an open architecture like openRISC or openSPARC and many others. They can be considered before reinventing your bike.

I decided to study the ARM compatible AMBER processor. Its source is on http://opencores.org .

First of all, it should be noted that the source code of the AMBER project from opencores does not describe the newest processor. This is just an ARM v2a. This is all due to the fact that older processors and their instruction set are patented, but v2a is not. Nevertheless, I decided to choose this processor for my experiments because:

All the same, this ARM is one of the most popular processors in phones, tablets and all sorts of embedded systems. Studying the ARM command system is not in vain.
The source code of the processor is written in Verilog - I am familiar with this language.
The developer of the system on the AMBER chip provides patches for the Linux kernel to build it and it seems that the instructions are quite detailed.

It seems to me that creating a “new” processor is in itself possible, but has no value without a compiler and an OS. Every developer who wants to start creating their own processor should think about it. The system on the AMBER chip seems to give food for the mind - there is an ARM v2a processor, and the GCC compiler is Linux.

Despite the fact that the source system is open, the work had to be done great. The original AMBER project was launched on a debug board with Xilinx FPGA. I was going to port the project to a completely different board with Altera FPGAs.

I am porting the AMBER project to the Mars Rover 2:

This board contains the FPGA Cyclone III EP3C10E144C8 (10 thousand logic elements), an SDRAM 8Mb memory chip, an integrated FTDI FT2232H based programmer, an ADC chip, buttons, LEDs, a VGA connector. It is a budget fee.

The FTDI chip provides a USB interface. Chip FT2232H has two channels that can be programmed in different modes of operation. This board assumes that channel A of the FTDI chip will be used for programming the FPGA via JTAG, and channel B will be like a high-speed serial port (921600 bps). Both channels operate independently of each other. Thus, the FPGA can receive or transmit data via the serial port, and in the meantime, the internal FPGA signals can be studied with the Altera SignalTap tool via the JTAG. It is quite convenient - all through a single USB cable - and power to the board, and data transfer and download / debugging of the project.

Unfortunately, porting the AMBER project to the Mars Rover 2 board was not very smooth and required quite a lot of effort. Some problems were predictable and understandable, such as the fact that in the project it was necessary to replace the Xilinx DLL components and memory blocks for the cache or the bootram with similar Altera components. Or another clear task - instead of using the components of the DDR controller, use the SDRAM module. I used a Wishbone BUS compatible open source SDRAM controller with the same opencores. It was also necessary to make new assignments of input / output signals to the FPGA pins according to the Mars Rover 2 card scheme.

Other problems that had to be solved turned out to be far from trivial.

For example, it turned out that some constructions of the Verilog language are interpreted by the Altera Quartus II compiler somewhat differently than the Xilinx compiler. Moreover, for the functional simulation of the project, I used Icarus Verilog and it seems that he interprets Verilog in the same way as Xilinx, and not in the same way as Altera. This is somewhat frustrating, but in the end, I managed to identify and fix problem areas in the source code of the Verilog AMBER processor after simulating icarus and in-circuit debugging with Altera SignalTap.

The next difficulty was with the Linux kernel, which I compiled to run on the board. Here I didn’t expect any problems at all, because the author of the system on the AMBER Santifort chip Conor described in his instruction how to build the core and how to run it. I do not know why, but his instructions did not work. I also encountered the problem of imposing patches and kernel compilation errors, which had to be resolved along the way. I also had to make changes to the configuration files of the kernel, to make changes to the kernel itself.

However, the result is already there. How it works in the gland:

Using the TeraTerm terminal program, I open the COM port owned by the board and from another program of the Altera Quartus II programmer I load the HW image of the system into the FPGA. The processor starts from the FPGA bootram and an interactive “monitor” appears in the terminal console. With it, you can read or write to the memory cells, as well as download any file from the computer to the RAM of the board through the same serial port with the XMODEM protocol.

So, I load the initrd at 0x700000 and load the Linux kernel itself at 0x80000 . The command “ j 80000 ” from the monitor console starts the kernel and you can see how it starts:

The full console output looks like this:

Linux version 2.4.27-vrs1 (nick@ubuntu) (gcc version 4.5.2 (Sourcery G++ Lite 2011.03-46) ) #1 Tue Jan 22 23:48:37 PST 2013 CPU: Amber 2 revision 0 Machine: Amber-FPGA-System On node 0 totalpages: 256 zone(0): 256 pages. zone(1): 0 pages. zone(2): 0 pages. Kernel command line: console=ttyAM0 mem=8M root=/dev/ram 19.91 BogoMIPS (preset value used) Memory: 8MB = 8MB total Memory: 6304KB available (783K code, 222K data, 64K init) Dentry cache hash table entries: 4096 (order: 0, 32768 bytes) Inode cache hash table entries: 4096 (order: 0, 32768 bytes) Mount cache hash table entries: 4096 (order: 0, 32768 bytes) Buffer cache hash table entries: 8192 (order: 0, 32768 bytes) Page-cache hash table entries: 8192 (order: 0, 32768 bytes) POSIX conformance testing by UNIFIX Linux NET4.0 for Linux 2.4 Based upon Swansea University Computer Society NET3.039 Initializing RT netlink socket Starting kswapd ttyAM0 at MMIO 0x16000000 (irq = 1) is a WSBN pty: 256 Unix98 ptys configured Serial driver version 5.05c (2001-07-08) with no serial options enabled ttyS00 at 0x03f8 (irq = 10) is a 16450 ttyS01 at 0x02f8 (irq = 10) is a 16450 RAMDISK driver initialized: 16 RAM disks of 208K size 1024 blocksize NET4: Linux TCP/IP 1.0 for NET4.0 IP Protocols: ICMP, UDP, TCP IP: routing cache hash table of 4096 buckets, 32Kbytes TCP: Hash tables configured (established 4096 bind 8192) NET4: Unix domain sockets 1.0/SMP for Linux NET4.0. RAMDISK: ext2 filesystem found at block 0 RAMDISK: Loading 200 blocks [1 disk] into ram disk... done. Freeing initrd memory: 208K VFS: Mounted root (ext2 filesystem) readonly. Freeing init memory: 64K BINFMT_FLAT: Loading file: /sbin/init Mapping is 2b0000, Entry point is 8068, data_start is 8e4c Load /sbin/init: TEXT=2b0040-2b8e4c DATA=2b8e50-2b8e83 BSS=2b8e83-2b8e88 start_thread(regs=0x21f9fa4, entry=0x2b8068, start_stack=0x2affb4) Hello, World! Hello, Marsohod!

Perhaps the most interesting thing begins when you produce a functional simulation of the processor. This makes it possible to "dig deeper", find out what is unavailable when you write or debug a program for a microcontroller or RaspberyPI. Simulation of the processor allows you to consider in detail any signals inside the processor or the whole system.

The simulation is performed using testbench. Testbench is the same program on Verilog, only it already describes the entire system.

For example, there is an AMBER system ported to the Mars Rover2 board. This is the project that I compile with the help of Altera Quartus II and then the resulting image is loaded into the FPGA. The AMBER SOC top-level module has a 100 MHz clock frequency input, TX / RX serial port input and output, as well as signals to an SDRAM memory chip.

If you write a testbench to this system, then it should no longer have any inputs or outputs - now it will be a top-level module - a simulator of a printed circuit board with its components. The testbench should simulate a crystal oscillator on the board, an external serial port transceiver (possibly sending commands through the port), and also simulate an SDRAM chip. Models of memory chips can be obtained from their manufacturers. At least Micron puts Verilog models of its memory chips into free access, where I took it (MT48LC4M16A2 chip model).

Running a testbench in the Verilog simulator makes it possible to completely trace all the signals of the system. It may not be easy and can be quite long. But it is a very powerful tool.

For example, in the simulator you can see the cycles of execution of processor commands, the contents of the registers, service signals:

You can also consider the pipeline processor, see the cycles of accessing the cache or external memory, you can actually measure and see the reaction to interruptions of the processor.

In general, you can fully simulate the test and launch of the Linux kernel.
I did this in the command line console using Icarus Verilog:

The truth in this screenshot was interrupted by the simulation at the very beginning - a complete simulation takes about an hour or even more, and if you do not do a full dump of all the signals in the VCD file. And yes, everything is possible.

I also dream to make my processor. I think this experience is useful to me ...

Useful links:
1) The initial design of the system on the AMBER chip on the site http://opencores.org
2) AMBER project sources for Altera Cyclone III FPGAs on GitHub (my porting). In the source, all the system code on a chip, as well as testbenches and patches for the Linux kernel. All that is required for full reproduction of the result.
3) The whole history of porting is described in detail here .
4) Description and layout of the Mars Rover2 board .

Source: https://habr.com/ru/post/167987/

All Articles

The study of the processor and its functional simulation

More articles: