In the forums I often see questions from novice C ++ programmers: “which literature would you recommend?”. I usually answer with a set of reliable books with the addition: no amount of books read will replace practice. You need to actually do something. But what? What could be a good project? You need something that will teach a lot, but it is quite simple and interesting in order not to get bored. I recently thought about this question, and seems to have found the answer. You should definitely write a bytecode interpreter. For me, such a project has been decisive in the development of the entire subsequent career.
In 200X, I was a second year student at the university. I already had a little programming experience. I was able to use abstractions available in C ++, I did not really understand how everything works. For me, the compiler and the operating system were just black boxes, working through magic spells, and I generally found this acceptable.
Lack of knowledge did not prevent me from actively participating in the wars of programming languages. One of these threads prompted me to get acquainted with the Java virtual machine, and I learned about the stack architecture .
I practically understood nothing in the x86 architecture, and did not hear anything about other architectures. The idea of a car without registers seemed to me very interesting and unusual. I thought about it all that day and decided to sit down and write my own simple stack virtual machine.
Stupid Virtual Machine (or SVM for short) followed the simplest possible idea. The word size (word) was 32 bits, and access to the memory was carried out according to words (it was impossible to access individual bytes). The memory areas for software code and data were completely isolated from each other (I later learned that this is a distinctive feature of Harvard architecture ). Even the stack was in its own separate part of the memory.
The instruction set was also quite simple. Standard arithmetic and logical instructions, work with memory, stack manipulations and jumps. Everything worked in the most obvious way. For example, the ADD
instruction took 2 first 32-bit values from the stack, added them as signed integers, and pushed the result onto the stack.
The input / output was primitive, tied to stdin / stdout. There were instructions IN
and OUT
. The first one pushed the reading result onto the stack, the second one displayed the first value from the stack. For convenience, I added a special flag: should the input be considered a stream of raw bytes or a string representation of a signed integer.
At the beginning, I wrote all the programs for SVM on pure machine code, in a hex editor. It quickly got tired, so I wrote an assembler with support for labels and string literals. For example, "Hello, World" looked like this:
"Hello, World!\n" print
When an assembler sees a string literal, it pushes each byte onto the stack. PRINT
is not an instruction, it is just a macro that generates a loop. The loop prints each character from the stack until it reaches 0.
Writing and reading code for a stack machine is a strange experience. Here is a slightly more advanced example, the calculation of the greatest common divisor looked like this:
IN ; "A" IN ; "B" :GCD ; DUP ; B 0, A gcd 0 ; ( ) @END ; ( ) JE ; ( ) SWP ; B 0, gcd(B, A modulo B) OVR MOD @GCD JMP ; ! :END POP ; 0 OUT ; ,
This demonstrates the use of tags and conditional transitions.
If you want to see an even more advanced example, then read the assembly code for sorting inserts , but I will not be offended if you decide to skip it. This is a fairly long listing, so I did not include it in the post.
If interested, you can also take a look at the virtual machine and assembler code that I found in the old files. There is nothing unusual, and in general, it is rather an example of how NOT to write a bytecode interpreter. To make stack machines work as well as register machines, it takes a few tricks. Of course, I did not know anything about them, and my approach was naive, which negatively affected the performance.
Writing even a primitive bytecode interpreter and several programs for it will make you think about things under the hood that are usually taken for granted.
For example, when I thought about the implementation of procedures in SVM, I realized that a function call is nothing more than a transition with an additional set of rules with which both the caller and the callee must implicitly agree. This helped me understand the concept of a calling convention, and magic things like _cdecl
and WINAPI
suddenly became meaningful.
Of course, such a project will not teach you all the cunning nuances of C ++ or C, but it will help you to form the right way of thinking. In those years, I had a certain mental barrier, because of which I did not dare to look inside the black boxes full of magic. It is very important to break this barrier if you are going to do programming seriously, and especially if you are interested in ~ low-level programming in languages like C ++ or C. Such controlled contact with "simulated low-level" programming taught me not to be afraid of segfault'ov and without fear of working with disassembler. It helped me a lot in my career, and I consider this project one of the most important in my life.
Source: https://habr.com/ru/post/310806/
All Articles