Forth CPU. What it is? (Part 1)

On Habré there were few posts about the Fort, and impressed by Moore ’s recent great work, I’ll tell you about another fort processor - J1 .
This is probably the most minimalistic processor that can be used in practice.
You can do it yourself on the FPGA, but I, as a person far from electronics, will try to write his emulator. And, in order to add post abnormalities, I will write an emulator in the Go language.

Hardware features

The processor is 16-bit, it can address up to 32K of memory (why not 64Kb will be understood below). The code and data are in different memory areas, the maximum size of the program running on this processor is also up to 32Kb.

The architecture is a stack one, the processor does not contain a single register (which could be explicitly addressed). But it contains two stacks:
')
- data stack - intended for calculations
- call stack - designed to save the return address when calling functions. Sometimes used as temporary storage.

Both stacks are shallow, with 33 elements each. Stacks are of course also 16-bit.

It's all. Instead of I / O ports, memory-mapped I / O is suggested. For this, it is recommended to reserve the top 16K of memory.

Emulator

I do not pretend to ideologically correct code on Go, but I will try to make people even unfamiliar with the language understand what it is about.

We describe the structure of the processor:

 type J1Stack [33]uint16 type J1Cpu struct { pc uint16 r J1Stack s J1Stack rp int sp int mem []uint16 }

So, our emulated processor inside has a pc instruction counter, a data stack s, a call stack r, pointers to the vertices of these stacks sp and rp respectively and internal memory (by the way, the memory is also only 16-bit, since the byte cannot be addressed) .

Let's write a constructor function for the type J1Cpu:

 func NewJ1Cpu(memSize int) (j1 *J1Cpu) { j1 = new(J1Cpu) j1.mem = make([]uint16, memSize) j1.sp, j1.rp = -1, -1 return j1 }

In the constructor, we allocate as much memory as specified, and initialize the stacks.

Instructions

The processor has 5 instructions:

They work as follows:
LIT X: put the value of X on the top of the stack s. Since the 1st bit indicates that this is a literal, only 15 bits are assigned to the value. Hence the 15-bit addressing code and data.
JMP X: go to address X
JZ X: if the value 0 is at the top of the stack s, then go to address X
CALL X: put PC + 1 at the top of the stack r (next instruction), and go to address X.
ALU: here is a little more difficult, I will tell about it below.

We implement the execution of these instructions in the Exec method:

 const ( J1_OP = (7 << 13) J1_ARG = 0x1fff J1_JMP = (0 << 13) J1_JZ = (1 << 13) J1_CALL = (2 << 13) J1_ALU = (3 << 13) J1_LIT = (4 << 13) J1_LIT_MASK = 0x7fff ) func (j1 *J1Cpu) Exec(op uint16) { if op & J1_LIT != 0 { j1.sp++ j1.s[j1.sp] = op & J1_LIT_MASK j1.pc++ return } arg := op & J1_ARG switch op & J1_OP { case J1_JMP: j1.pc = arg case J1_JZ: if j1.s[j1.sp] == 0 { j1.pc = arg } j1.sp-- case J1_CALL: j1.rp++ j1.r[j1.rp] = j1.pc + 1 j1.pc = arg case J1_ALU: j1.execAlu(arg) } }

By changing pointers to the top of the stack, I use pre-increment and post-decrement. However, in Go you cannot use them as in C (stack [++ sp] = x; ... x = stack [sp--]). Because it is necessary to indicate such things explicitly.

It remains to implement the execAlu method - and it's done!

J1 ALU

This is the most powerful instruction. Let's first define the characters that are used by the authors of the processor.

T - the element on top of the stack at the beginning of the instruction
N - the element following T at the beginning of the instruction
T '- the result of arithmetic operations
R - the element on top of the call stack at the beginning of the instruction
PC - processor instruction count value
[T] - indirect memory access at address T

Arithmetic instructions total 16:
0: T '= T
1: T '= N
2: T '= T + N
3: T '= T and N
4: T '= T or N
5: T '= T xor N
6: T '= ~ T (bitwise inversion)
7: T '= (T == N)
8: T '= (N <T)
9: T '= N >> T (right shift by T bit)
10: T '= T-1
11: T '= R
12: T '= [T]
13: T '= N << T
14: T '= depth (data stack depth)
15: T '= (N <T), this time unsigned comparison (unsigned)

Here is what the implementation of these operations will look like:

 func (j1 *J1Cpu) execAlu(op uint16) { var res, t, n, r uint16 if j1.sp >= 0 { t = j1.s[j1.sp] } if j1.sp > 0 { n = j1.s[j1.sp-1] } if j1.rp > 0 { r = j1.r[j1.rp-1] } j1.pc++ code := (op & (0xf << 8)) >> 8 switch code { case 0: res = t case 1: res = n case 2: res = t + n case 3: res = t & n case 4: res = t | n case 5: res = t ^ n case 6: res = ^t case 7: if n == t { res = 1 } case 8: if int16(n) < int16(t) { res = 1 } case 9: res = n >> t case 10: res = t - 1 case 11: res = j1.r[r] case 12: res = j1.mem[t] case 13: res = n << t case 14: res = uint16(j1.sp + 1) case 15: if n < t { res = 1 } } ds := op & 3 rs := (op & (3 << 2)) >> 2 if ds == 1 { j1.sp++ } if ds == 2 { j1.sp-- } if rs == 1 { j1.rp++ } if rs == 2 { j1.rp-- } if op&(1<<5) != 0 { println("N->[T]") j1.mem[t] = n } if op&(1<<6) != 0 { println("T->R") if j1.rp < 0 { panic("Call stack underrun") } j1.r[j1.rp] = t } if op&(1<<7) != 0 { println("T->N") if j1.sp > 0 { j1.s[j1.sp-1] = t } } if op&(1<<12) != 0 { println("R->PC") j1.pc = j1.r[r] } if j1.sp >= 0 { j1.s[j1.sp] = res } }

The function turned out so big because I am not very good at analyzing the parameters of the command. In general terms, it works simply - it gets the current values of T, N, R, calculates the new value of T ', shifts the pointers to the top of the stacks and sets the new value of T.
Now that's exactly what the processor can do.

For example, the addition of two numbers - these are three commands
LIT 5 (5 on stack)
LIT 7 (5 7 on the stack)
ALU OP = 2 (addition), DS = -1 (i.e. add and stack pointer decrease by 1. On the stack - 12).

Interestingly, separate instructions are not required to exit a function. Just in the last instruction is added the flag "R-> PC". This, according to the developers, reduces the amount of code by about 7%.

And where does Forth itself?

The fact is that this instruction set is optimized for the Fort language. Most of the words in this language correspond to single J1 instructions. But a brief introduction to the Fort in relation to the J1 processor will be in the next article, and now I will leave a link to the full source code of the emulator .
Who already knows the fort - the cross-compiler for J1 can be found here (crossj1.fs).

Source: https://habr.com/ru/post/133338/

All Articles