Writing a simple processor and environment for it

Hello! In this article I will tell you what steps you need to go to create a simple processor and environment for it.

Command Set Architecture (ISA)

First you need to decide on how the processor will be. Important parameters such as:

The size of a machine word and registers (digit capacity / "bit depth" of the processor)
Machine instructions (instructions) and their size

Processor architectures can be divided by the size of instructions into 2 types (in fact, there are more, but other options are less popular):

The main difference is that the RISC processors have the same size instructions. Their instructions are simple and run relatively quickly, whereas CISC processors can have different size instructions, some of which can be executed for quite a long time.

I decided to make the RISC processor much like MIPS .

I did this for a variety of reasons:

It's pretty easy to create a prototype of such a processor.
All the complexity of this type of processor is shifted to such programs as an assembler and / or compiler.

Here are the main characteristics of my processor:

Machine word and register size - 32 bits
64 registers (including command counter )
2 types of instructions

Register type (extension. Register type) looks like this:

rtype

The peculiarity of such instructions is that they operate with three registers.

Immediate type (lit. Immediate type):

itype

Instructions of this type operate on two registers and a number.

OP is the number of the instruction to be executed (or to indicate that this Register type instruction).

R0 , R1 , R2 are numbers of registers that serve as operands for instructions.

Func is an additional field that serves to specify the type of Register type instructions.

Imm is the field where the value is written that we want to explicitly provide instructions as an operand.

Only 28 instructions

A complete list of instructions can be viewed in the github repository .

Here are just a couple of them:

nor r0, r1, r2

NOR is the Register type instruction, which makes a logical OR NOT on the registers r1 and r2, then writes the result to the register r0.

In order to use this instruction, you need to change the OP field to 0000 and the Func field to 0000000111 in the binary number system.

 lw r0, n(r1)

LW is an Immediate type instruction that loads the memory value at address r1 + n into the register r0.

In order to use this instruction, in turn, you need to change the OP field to 0111 , and write the number n in the IMM field.

Writing processor code

After creating the ISA, you can begin to write the processor.

To do this, we need the knowledge of any hardware description language. Here are some of them:

Verilog
VHDL (not to be confused with the previous one!)

I chose Verilog, because programming on it was part of my university course.

To write a processor, you need to understand the logic of its work:

Getting instructions at the Team Counter (PC) address
Decoding instructions
Execution of instructions
Adding to the team counter the size of the instruction executed

And so on to infinity.

It turns out you need to create several modules:

Separate each module separately.

Register file

In my case, I have 64 registers. One of the registers records the result of the operation on the other two, so I need to provide the ability to change only one, and get the values from the other two.

Decoder

A decoder is the unit that is responsible for decoding instructions. It indicates which operations need to be performed by the ALU and other units.

For example, the addi instruction must add the value of the $ zero register (It always stores 0 ) and 20 and put the result in the $ t0 register.

 addi $t0, $zero, 20

At this stage, the decoder determines that this instruction:

Immediate type
Must write result to register

And passes this information to the following blocks.

ALU

After management goes to the ALU. It usually performs all mathematical, logical operations, as well as operations of comparing numbers.

That is, if we consider the same instruction addi , then at this stage the addition of 0 and 20 occurs.

Other

In addition to the above blocks, the processor should be able to:

Get and change values in memory
Perform conditional transitions

Here and there you can see how it looks in code.

Assembler

After writing the processor, we need a program that would convert text commands into machine code in order not to do it manually. Therefore, you need to write an assembler.

I decided to implement it in the C programming language.

Since my processor has a RISC architecture, in order to simplify my life, I decided to design an assembler so that you can easily add your pseudoinstructions (combinations of several basic instructions or other pseudoinstructions) to it.

You can do this with the help of a data structure that stores the type of instruction, its format, a pointer to a function that returns the machine instruction codes, and its name.

A regular program begins with a segment announcement.

For us, two .text segments are enough - in which the source code of our programs will be stored - and .data - in which our data and constants will be stored.

The instruction may look like this:

 .text jie $zero, $zero, $zero #  addi $t1, $zero, 2 # $t1 = $zero + 2 lw $t1, 5($t2) # $t1 = *($t2 + 5) syscall 0, $zero, $zero # syscall(0, 0, 0) la $t1, label# $t1 = label

First, the name of the instruction, then the operands.

In .data , data declarations are indicated.

 .data .byte 23 #   1  .half 1337 #   2  .word 69000, 25000 #   4  .asciiz "Hello World!" #     ( ) .ascii "12312009" #   ( ) .space 45 #  45

The declaration must begin with a dot and the name of the data type, followed by constants or arguments.

It is convenient to parse (scan) the assembler file in this form:

First we scan the segment
If this is a .data segment, then we parse different data types or .text segment
If this is a .text segment, then we parse the command or the .data segment.

For the assembler to work, you need to go through the source file 2 times. For the first time, he considers how the offsets are the links (they serve for), they usually look like this:

  la $s4, loop #   loop  s4 loop: # ! mul $s2, $s2, $s1 # s2 = s2 * s1 addi $s1, $s1, -1 # s1 = s1 - 1 jil $s3, $s1, $s4 #  s3 < s1

And in the second pass, you can already generate a file.

Total

In the future, you can run the output file from the assembler on our processor and evaluate the result.

Also ready assembler can be used in C compiler. But it is already later.

References:

Designing Digital Computer Systems with Verilog. David J. Lilja and Sachin S. Sapatnekar
Source
Source code of another processor

Source: https://habr.com/ru/post/430680/

All Articles