📜 ⬆️ ⬇️

Another virtual machine architecture (part one)

This post is a continuation of another operating system architecture .

Having decided on the basic ideas, I began to think about where to start developing, yes, moreover, so that, faced with difficulties, I would not lose interest. In fairness, I note that this attempt is far from my first. For example, the last time I, in the simplicity of my heart, began by writing a bootloader. Having played enough with the real and protected modes, I finished on a working prototype, imperceptibly losing all my interest. The current attempt began with a conscious understanding of what to start with the API, and for this you do not need to enter into intimate relationships with segment descriptors.

The native API (C / C ++) did not work for several reasons. First, it requires shared address spaces, which entails decent overhead for IPC and interaction with the kernel. Inspired by modern trends, I wanted a single address space OS. Secondly, the native API will not provide binary code compatibility between different architectures. And, finally, such an API will interfere with the transparency of remote calls. So, the virtual machine was required. With it, I decided to start.
')



Here are the basic requirements that drew themselves:

1. Memory protection (the code of one module cannot destroy the memory of another module);
2. Direct support for call transparency;
3. Use Occam's Razor (KISS);
4. Efficiency;

Therefore, it is necessary to abandon the "fat" abstractions, such as classes, mark and sweep GC, and others (the last two points). In addition, the virtual machine should support only 64-bit logical and arithmetic operations (third paragraph). Indeed, given the fact that the system will be located entirely in the virtual address space, it does not make sense to focus on 32 bits (4 gigabytes is not enough even for a smartphone today). Further, the virtual machine architecture should be a register, not a stack one (the last item).

Variables

1. When a thread executes a program, it reads and modifies variables;
2. A variable is a continuous memory area, organized as an array of elements of the same size;
3. The type of a variable determines the structure of an element of a variable, and the variable descriptor determines the number of its elements and flags (additional properties);
4. The type of the variable and contains the fields:
a. bytes - the number of bytes of the element available for arithmetic and logical operations;
b. vrefs is an array of variable handles. Matches element references to other variables;
c. prefs is an array of procedure type identifiers. Corresponds to element references to procedures;

The reason for the separation of the structure of the element and the number of elements in a variable into different entities (the type and handle of the variable) is not obvious. In practice, it turned out that the type of a variable is a more universal concept used in different operations, whereas the number of elements is tied only to a specific variable.



In C ++, we have the following definition of the type of a variable (vm / vmdefs.h):

typedef uint32_t VarTypeId ;
typedef uint32_t ProcTypeId ;

struct VarSpec {
uint32_t flags ;
VarTypeId vtype ;
size_t count ;
} ;

struct VarType {
size_t bytes ;
std :: vector < varspec > vrefs ;
std :: vector < ProcTypeId > prefs ;
} ;

As can be seen from the above definitions, the reference to a variable determines the number of its elements. In fact, there are variables with a variable number of elements, but for now we’ll omit it. Also referring to the references to the procedures (for now we will omit). I will only note that the variable is, in fact, an analogue of an array of structures, each of which has simple fields, pointers to other variables, and pointers to other functions. The only difference is that here the structure fields are semantically clearly divided into three classes. Each class corresponds to a different instruction. In addition, for simplicity, the fields of each class are grouped into continuous sections of memory (an array of bytes, an array of references to variables, an array of references to procedures). Variable types are statically set in the module body at the time of its creation and cannot be changed.

Registers

Virtual machine instructions do not operate on variables directly. They do this using registers. A register is a number that is associated with a variable during the execution of a code, which allows it to be read and modified by means of instructions. The register corresponds to a variable handle. Registers, as well as types of variables, are statically set in the module body at the time of its creation, and cannot be changed.

The registers are “bound” and “untied” from variables using the PUSH, PUSHR, POP instructions. The PUSH and PUSHR instructions take a register as an argument. The first allocates a new variable on the stack and associates it with the given register. The second allocates a variable reference on the stack and associates it with the given register. In the second case, the register cannot be used for reading / modification until the variable is allocated in the heap with a special instruction (a detailed description of the peculiarities of the variables allocated in the heap is omitted for now). The POP instruction overrides the last PUSH or PUSHR. This removes the variable (or a reference to the variable) and the corresponding register returns to the previous assignment of the variable.



In conclusion, I will give the code that creates a module with one external procedure (function) that calculates factorial. We will leave the discussion of the details until the next post, but for now just make an impression (vm / test / modules.cpp):

void createFactorialModule ( Module & module ) {
ModuleBuilder builder ;

// void fact (unsigned int * io) {
// if (* io)
// goto l1;
// * io = 1;
// return;
// l1:
// {
// unsigned int pr = 1;
// l2:
// pr = * io * pr;
// if (- * io)
// goto l2;
// * io = pr;
//}
//}
VarTypeId vtype = builder. addVarType ( 8 ) ;
RegId io = builder. addReg ( 0 , vtype ) ;
ProcTypeId ptype = builder. addProcType ( 0 , io ) ;
ProcId proc = builder. addProc ( PFLAG_EXTERNAL, ptype ) ;
builder. addProcInstr ( proc, JNZInstr ( io, 3 ) ) ;
builder. addProcInstr ( proc, CPI8Instr ( 1 , io ) ) ;
builder. addProcInstr ( proc, RETInstr ( ) ) ;
RegId pr = builder. addReg ( 0 , vtype ) ;
builder. addProcInstr ( proc, PUSHInstr ( pr ) ) ;
builder. addProcInstr ( proc, CPI8Instr ( 1 , pr ) ) ;
builder. addProcInstr ( proc, MULInstr ( io, pr, pr ) ) ;
builder. addProcInstr ( proc, DECInstr ( io ) ) ;
builder. addProcInstr ( proc, JNZInstr ( io, - 2 ) ) ;
builder. addProcInstr ( proc, CPBInstr ( pr, io ) ) ;
builder. addProcInstr ( proc, POPInstr ( ) ) ;
builder. addProcInstr ( proc, RETInstr ( ) ) ;

builder. createModule ( module ) ;
}

Source: https://habr.com/ru/post/341112/


All Articles