Here I tried to show in practice what are some of the important concepts from the field of compiler creation. It is likely that such 15-minute completed stories may be a good way to dive into complex topics. Only it would be good not to passively read what is presented below, but also to check the code in the work.
If the first experience is successful, then in the future you can expect other 15-minute "sketches" on the subject of compilers.
Let's make an arithmetic expressions compiler. One that will translate the source text in the reverse Polish form of the record (also called RPN or POLIZ) into an intermediate code working with the stack. But we will do without interpreters. Then we immediately translate the result into a representation in the C language. That is, we will have a compiler from RPN to C.
By the way, we will write the compiler in Python. But let it not stop those who prefer some other programming language. Here is a useful exercise for you: translate the code in your favorite language. Or use the already prepared translation:
Implementation on F # (by gsomix ):
first version
SSA version
def scan(source): tokens = source.split() return [("Push", int(x)) if x[0].isdigit() else ("Op", x) for x in tokens]
What did we do here? The scan function receives a string from the user in the reverse Polish notation ("2 2 +").
And at the output we get its intermediate presentation. Here it is, for example:
[ ('Push', 2), ('Push', 2), ('Op', '+') ]
So, we already got the compiler. But it is very frivolous. Recall that initially it was about the C code.
def trans(ir): code = [] for tag, val in ir: if tag == "Push": code.append("st[sp] = %d;" % val) code.append("sp += 1;") elif tag == "Op": code.append("st[sp - 2] = st[sp - 2] %s st[sp - 1];" % val) code.append("sp -= 1;") return "\n".join(code)
What's going on here? Let's look at the output of this function (in the same example with "2 2 +").
st[sp] = 2; sp += 1; st[sp] = 2; sp += 1; st[sp - 2] = st[sp - 2] + st[sp - 1]; sp -= 1;
Yes, it already looks like C code. The st array plays the role of the stack, and sp its pointer. Usually virtual stack machines work with these things.
That's just the machine itself - we don’t have an interpreter. There is a compiler. What is left for us? You need to add the necessary framing for the C program.
ST_SIZE = 100 C_CODE = r"""#include <stdio.h> int main(int argc, char** argv) { int st[%d], sp = 0; %s printf("%%d\n", st[sp - 1]); return 0; }""" def scan(source): tokens = source.split() return [("Push", int(x)) if x[0].isdigit() else ("Op", x) for x in tokens] def trans(ir): code = [] for tag, val in ir: if tag == "Push": code.append("st[sp] = %d;" % val) code.append("sp += 1;") elif tag == "Op": code.append("st[sp - 2] = st[sp - 2] %s st[sp - 1];" % val) code.append("sp -= 1;") return "\n".join(code) def rpn_to_c(source): return C_CODE % (ST_SIZE, trans(scan(source))) print(rpn_to_c("2 2 +"))
It remains to compile the output of this program by the C compiler.
Are you still ready to continue? Then let's discuss what we did. There is one dubious point - our compiler translates constant expressions, and in fact they can be calculated simply at the compilation stage. It makes no sense to translate them into code. But let's assume for now that some arguments may fall into the stack from the outside. Let us dwell on the fact that the practical meaning of our development can be given later. Now it is important to get a general idea about the construction of the simplest compilers, right?
Do you like the headline? SSA - it sounds very solid for any compiler. And we will now use this very SSA. What is it? Let's move in order.
We are currently generating C code without any virtual machines. But why do we need a rudiment in the form of operations with the stack? Let's replace these operations with ordinary variables from C. Moreover, we will not save variables - for each expression we will enter a new name. Let the C compiler deal with all of this. It turns out that each variable has a value assigned to it only once. And this, by the way, is the form of SSA .
Here is our new compiler.
C_CODE = r"""#include <stdio.h> int main(int argc, char** argv) { %s printf("%%d\n", %s); return 0; }""" def scan(source): tokens = source.split() return [("Push", int(x)) if x[0].isdigit() else ("Op", x) for x in tokens] def trans(ir): stack, code = [], [] name_cnt = 0 for tag, val in ir: if tag == "Push": code.append("int t%d = %d;" % (name_cnt, val)) stack.append("t%d" % name_cnt) name_cnt += 1 elif tag == "Op": a, b = stack.pop(), stack.pop() code.append("int t%d = %s %s %s;" % (name_cnt, b, val, a)) stack.append("t%d" % name_cnt) name_cnt += 1 return "\n".join(code), stack.pop() def rpn_to_c(source): return C_CODE % trans(scan(source)) print(rpn_to_c("2 2 +"))
Please note - the stack in the C code is no longer there, and working with it is simulated in the process of translation. The stack used in the compilation process does not contain values, but variable names.
Here is the final result:
#include <stdio.h> int main(int argc, char** argv) { int t0 = 2; int t1 = 2; int t2 = t0 + t1; printf("%d\n", t2); return 0; }
It seems that the time of our joint activities has expired. We were committed to translating the program from one language to another. This is called a source-to-source broadcast. Or - just a translation, which can be considered synonymous with compilation, but usually the compiler translates the program from a high-level representation to a low-level one. There is another buzzword "transpilator" for the source-to-source translator. But the mention of a "transpiler" can cause irritation among compilers, be careful!
Experiment with the code. Waiting for feedback!
Source: https://habr.com/ru/post/432982/
All Articles