⬆️ ⬇️

How to implement a programming language in JavaScript. Part 2: Interpreter

Hello! I present to you the second part of my translation of the manual implementing my programming language in JavaScript - the PL Tutorial .



From translator



We will create our own programming language - λ language (in the original - λanguage). In the process of creation, we will use a lot of interesting techniques, such as recursive descent, control transfer style, basic optimization techniques. Two versions of the interpreter will be created - the normal and the CPS interpreter, the trans compiler in JavaScript.



The original author is Mihai Bazon , the author of the well-known UglifyJS library (a tool for minimizing and formatting JS code).





PS In the interpreter and compiler there is a bug: in expressions of the type a() && b() or a() || b() a() || b() always executes both parts. This, of course, is wrong because a() false for the && operator, or not false for the operator || , then the value of b() no longer plays any role. It is easy to fix.



Simple interpreter



In the previous section, we wrote 3 functions: InputStream , TokenStream and parse . To get the AST from the code we use the following code:



 var ast = parse(TokenStream(InputStream(code))); 


It is easier to write an interpreter than a parser: we simply recursively go through the tree and execute the expressions in their normal order.



Context



For the correct execution of the code, we need a context - an object containing all the variables in a given place. It will be passed as an argument to the evaluate function.



Every time we enter the lambda node, we need to add new variables to the context - function arguments. If the argument overlaps a variable from an external block, we must restore the old value of the variable after exiting the function.



The easiest way to do this is to use prototype JavaScript inheritance. When we enter a new function, we create a new context, set the external context as its prototype, and call the function in the new context. Due to this, we have nothing - in the external context all its variables will remain.



Here is the implementation of the Environment object:



 function Environment(parent) { this.vars = Object.create(parent ? parent.vars : null); this.parent = parent; } Environment.prototype = { extend: function() { return new Environment(this); }, lookup: function(name) { var scope = this; while (scope) { if (Object.prototype.hasOwnProperty.call(scope.vars, name)) return scope; scope = scope.parent; } }, get: function(name) { if (name in this.vars) return this.vars[name]; throw new Error("Undefined variable " + name); }, set: function(name, value) { var scope = this.lookup(name); //          if (!scope && this.parent) throw new Error("Undefined variable " + name); return (scope || this).vars[name] = value; }, def: function(name, value) { return this.vars[name] = value; } }; 


The Environment object has a parent field that indicates an external context. It will be null for the global context. It has a vars field in which all variables belong to the given context. For a global context, it immediately equals an empty object ( Object.create(null) ) and copies of the variables of the parent context ( Object.create(parent.vars) ) for a non-global one.



He has several methods:





evaluate function



Now that we have an Environment object, we can move on to solving the main problem. This function will be a large switch block that will do some kind of action depending on the type of node transferred:



 function evaluate(exp, env) { switch (exp.type) { 


For literals, we simply return its value:



  case "num": case "str": case "bool": return exp.value; 


Variables are taken from the context (the name of the variable is contained in the value field):



  case "var": return env.get(exp.value); 


For assignment, we need to make sure that on the left side we have the name of the variable (the var node). If not, we just throw an exception (we do not support the assignment of anything other than variables). Next, we set the value of the variable using env.set . Note that the right side of the expression must be calculated using the recursive call of evaluate :



  case "assign": if (exp.left.type != "var") throw new Error("Cannot assign to " + JSON.stringify(exp.left)); return env.set(exp.left.value, evaluate(exp.right, env)); 


For a node of type binary we must apply an operator to both operands. We will write the apply_op function later. Also, we call evaluate for both parts of the expression:



  case "binary": return apply_op(exp.operator, evaluate(exp.left, env), evaluate(exp.right, env)); 


A lambda node will return a regular javascript closure, so you can call it as a normal function, even from javascript. I added the function make_lambda , which I will show later:



  case "lambda": return make_lambda(env, exp); 


The execution of the if node if quite simple: first we find the value of the condition. If it is not false, then return the value of the then branch. Otherwise, if there is an else branch, then its value, or false :



  case "if": var cond = evaluate(exp.cond, env); if (cond !== false) return evaluate(exp.then, env); return exp.else ? evaluate(exp.else, env) : false; 


The prog node is a sequence of expressions. We simply execute all the expressions in order and take the value of the last (the value of the empty sequence is false ):



  case "prog": var val = false; exp.prog.forEach(function(exp){ val = evaluate(exp, env) }); return val; 


For a call node, we need to call a function. Before this we find the value of the function itself, find the values ​​of all the arguments and call the function using apply :



  case "call": var func = evaluate(exp.func, env); return func.apply(null, exp.args.map(function(arg){ return evaluate(arg, env); })); 


We will never get here, but in case we add a new node type to the parser and forget to add it to the interpreter:



  default: throw new Error("I don't know how to evaluate " + exp.type); } } 


This was the main part of the interpreter. Above, we used two functions that we have not yet implemented, so let's start:



apply_op :



 function apply_op(op, a, b) { function num(x) { if (typeof x != "number") throw new Error("Expected number but got " + x); return x; } function div(x) { if (num(x) == 0) throw new Error("Divide by zero"); return x; } switch (op) { case "+" : return num(a) + num(b); case "-" : return num(a) - num(b); case "*" : return num(a) * num(b); case "/" : return num(a) / div(b); case "%" : return num(a) % div(b); case "&&" : return a !== false && b; case "||" : return a !== false ? a : b; case "<" : return num(a) < num(b); case ">" : return num(a) > num(b); case "<=" : return num(a) <= num(b); case ">=" : return num(a) >= num(b); case "==" : return a === b; case "!=" : return a !== b; } throw new Error("Can't apply operator " + op); } 


It gets the operator type and arguments. Simple and clear switch . Unlike JavaScript, which can take any values, like variables, even those that have no meaning. We demand that the operands of arithmetic operators be numbers and do not allow division by zero. For the strings we will come up with something later.



make_lambda :



 function make_lambda(env, exp) { function lambda() { var names = exp.vars; var scope = env.extend(); for (var i = 0; i < names.length; ++i) scope.def(names[i], i < arguments.length ? arguments[i] : false); return evaluate(exp.body, scope); } return lambda; } 


As you can see above, it returns a regular JavaScript function that uses the passed context and the AST function. All work is done only when the function itself is called: the context is created, the arguments are set (if they are not enough, they become false ). Then the body of the function is simply executed in a new context.



Native functions



As you can see, we had no way to interact with JavaScript from our language. I used to use the print and print functions, but I never defined them anywhere. We need to write them in javascript and just add to the global context.



Here is an example of such a code:



 //  -   var code = "sum = lambda(x, y) x + y; print(sum(2, 3));"; // ,  parse  TokenStream,      InputStream var ast = parse(TokenStream(InputStream(code))); //    var globalEnv = new Environment(); //  ""  "print" globalEnv.def("print", function(txt){ console.log(txt); }); //  evaluate(ast, globalEnv); //  5 


All code



You can download all the code that we have written for all this time. It can be run using NodeJS. Just pass the code to the standard stream:



 echo 'sum = lambda(x, y) x + y; println(sum(2, 3));' | node lambda-eval1.js 


Code examples



Our programming language, though simple, can (theoretically) solve any problem that can be solved by a computer at all. This is because some dudes, smarter than I ever will be - Alonzo Church and Alan Turing - have proved in their time that λ-calculus (lambda calculus) are equivalent to the Turing machine, and our λ-language implements λ-calculus.



This means that even if there are no opportunities in our language, we will be able to realize them all the same using what we already have. Or, if this is difficult to do, we can write an interpreter for another language in this.



Cycles



Loops are not a problem if we have a recursion. I already showed an example of a loop implemented on top of recursion. Let's try it again.



 print_range = λ(a, b) if a <= b { print(a); if a + 1 <= b { print(", "); print_range(a + 1, b); } else println(""); }; print_range(1, 10); 


But here we have a problem: if we increase the number of iterations, say, to 1000. I, for example, get the error “Maximum call stack size exceeded” after 600. This happens because the interpreter is recursive and the recursion reaches the maximum depth.



This is a serious problem, but there is a solution. I'd like to add new constructions for iteration ( for or while ), but let's try to do without them. Recursion looks beautiful, so let's leave it. We will later see how to get around this limitation.



Data structures (lack thereof)



Our λ language has three data types: numbers, strings, and a boolean type. It may seem impossible to create complex types, such as arrays, or objects. But this is not tat, we have another type: function. It turns out that if we follow λ-calculus, then we can create any data structures, including objects, even with inheritance.



I will show it on the example of lists. Let's imagine that we have a cons function that creates an object containing two values. Let's call this object a cell or a pair. Let's call one of the values ​​"car" and the other one - "cdr". Just because they are so called in Lisp. Now, if we have a "cell" object, then we can get its values ​​using the car and cdr functions:



 x = cons(10, 20); print(car(x)); #  10 print(cdr(x)); #  20 


Now we can easily define the list:



The list is a pair containing the first item in `car` and the rest items in` cdr`. But `cdr` can contain only one value! This value will be a list. The list is a pair containing the first item in `car` and the rest items in` cdr`. But `cdr` can contain only one value! This value will be a list. [...]

This results in a recursive data type. But one problem remains: when to stop? Logically, we should stop when cdr is an empty list. But what is an "empty list"? To do this, let's add a new object called "NIL". It can be used as a pair (we can use car and cdr on it, but the result will be NIL itself). Now let's create a list of items 1, 2, 3, 4, 5:



 x = cons(1, cons(2, cons(3, cons(4, cons(5, NIL))))); print(car(x)); # 1 print(car(cdr(x))); # 2  Lisp  . cadr print(car(cdr(cdr(x)))); # 3 caddr print(car(cdr(cdr(cdr(x))))); # 4 cadddr print(car(cdr(cdr(cdr(cdr(x)))))); # 5     . 


It looks terrible when there is no special syntax for it. But I just wanted to show that this can be done using the existing capabilities of λzka. Here is the implementation:



 cons = λ(a, b) λ(f) f(a, b); car = λ(cell) cell(λ(a, b) a); cdr = λ(cell) cell(λ(a, b) b); NIL = λ(f) f(NIL, NIL); 


When I first saw cons / car / cdr made in this way, I was surprised that they didn’t need any if (but this is strange given the fact that it is not in the original λ-calculus). Of course, no programming language does this in this way, because it is extremely inefficient, but it does not make λ-calculus less beautiful. In plain language, this code does the following:





 cons = λ(a, b) λ(f) f(a, b); car = λ(cell) cell(λ(a, b) a); cdr = λ(cell) cell(λ(a, b) b); NIL = λ(f) f(NIL, NIL); x = cons(1, cons(2, cons(3, cons(4, cons(5, NIL))))); println(car(x)); # 1 println(car(cdr(x))); # 2 println(car(cdr(cdr(x)))); # 3 println(car(cdr(cdr(cdr(x))))); # 4 println(car(cdr(cdr(cdr(cdr(x)))))); # 5 


There are many algorithms on lists that can be implemented recursively and look logical. For example, here is the function that calls the passed function for each list item:



 foreach = λ(list, f) if list != NIL { f(car(list)); foreach(cdr(list), f); }; foreach(x, println); 


And here is another one that creates a list for the range of numbers:



 range = λ(a, b) if a <= b then cons(a, range(a + 1, b)) else NIL; #     1  8 foreach(range(1, 8), λ(x) println(x * x)); 


The lists that we have implemented above are immutable (we cannot change car or cdr after the list has been created). Most Lisp's have functions to change a pair. In Scheme, they are called set-car! / set-cdr! . In Common Lisp - rplaca / rplacd . This time we use the names from Scheme:



 cons = λ(x, y) λ(a, i, v) if a == "get" then if i == 0 then x else y else if i == 0 then x = v else y = v; car = λ(cell) cell("get", 0); cdr = λ(cell) cell("get", 1); set-car! = λ(cell, val) cell("set", 0, val); set-cdr! = λ(cell, val) cell("set", 1, val); #  NIL     NIL = cons(0, 0); set-car!(NIL, NIL); set-cdr!(NIL, NIL); ## : x = cons(1, 2); println(car(x)); # 1 println(cdr(x)); # 2 set-car!(x, 10); set-cdr!(x, 20); println(car(x)); # 10 println(cdr(x)); # 20 


This shows that we can implement mutable data structures. I will not delve into how it works, it is clear from the code.



We can go further and implement the objects, but without changes in the syntax it will be difficult to do. Another way is to implement a new syntax in the tokenizer / parser and add them to the interpreter. This is done by all major programming languages, and it is necessary to achieve normal performance. We will add a new syntax in the next part of the article.



[From the translator: if you are interested in lambda calculus, on Habré there is a cool article on this topic: Lambda calculus in JavaScript .]



New syntax constructs



Our λ language has quite a few syntactic constructs. For example, there is no direct way to add new variables. As I said before, we must use IIFE, so it looks like this:



 (λ(x, y){ (λ(z){ ## it gets worse when one of the vars depends on another print(x + y + z); })(x + y); })(2, 3); 


We will add the let keyword. This will allow us to write something like this:



 let (x = 2, y = 3, z = x + y) print(x + y + z); 


For each variable in the let block, previous variables should be available even from the same block. Therefore, the code above will be equivalent to the following:



 (λ(x){ (λ(y){ (λ(z){ print(x + y + z); })(x + y); })(3); })(2); 


These changes can be made directly in the parser and then will not require changes in the interpreter. Instead of adding a new let node, we can turn it into call and lambda nodes. This means that we have not made any semantic changes in our language - this is called “syntactic sugar”, and the operation of turning it into AST nodes that existed before is called “saccharification” (the original: “desugaring”).



However, we have to change the parser anyway. Let's add a new "let" node because it can be interpreted more efficiently (no need to create closures and call them right away, we just need to copy and change the context).



Also, we will add support for the "let named" that was in Scheme. It makes it easier to create loops:



 print(let loop (n = 10) if n > 0 then n + loop(n - 1) else 0); 


This is a "recursive" cycle, which counts the sum of 10 + 9 + ... + 0. Previously, we would have to do it like this:



 print((λ(loop){ loop = λ(n) if n > 0 then n + loop(n - 1) else 0; loop(10); })()); 


Also, to make it easier, we will add the syntax of "functions with a name." It will look like this:



 print((λ loop (n) if n > 0 then n + loop(n - 1) else 0) (10)); 


Modifications that need to be done for this:





Parser changes



First, a small change in the tokenizer, add the let keyword to the keyword list:



 var keywords = " let if then else lambda λ true false "; 


Change the parse_lambda function to read the optional name:



 function parse_lambda() { return { type: "lambda", name: input.peek().type == "var" ? input.next().value : null, //   vars: delimited("(", ")", ",", parse_varname), body: parse_expression() }; } 


Now add a function that parses the let expression:



 function parse_let() { skip_kw("let"); if (input.peek().type == "var") { var name = input.next().value; var defs = delimited("(", ")", ",", parse_vardef); return { type: "call", func: { type: "lambda", name: name, vars: defs.map(function(def){ return def.name }), body: parse_expression(), }, args: defs.map(function(def){ return def.def || FALSE }) }; } return { type: "let", vars: delimited("(", ")", ",", parse_vardef), body: parse_expression(), }; } 


This handles two cases. If after let there is a token of type var , then this is let with the name. Next, we read the list of definitions using the delimited function, since they are in brackets and separated by commas, and use the parse_vardef function, which is shown below. Next we return the call node, which immediately calls the function with the name (IIFE). The arguments to the function are the variables defined in let , and the call node will pass the values ​​as arguments. And, of course, the function body is read using parse_expression() .



If this is a simple let , then we return a let type node with the vars and body fields. The vars field contains an array of variables in the following format: { name: VARIABLE, def: AST } , which are parsed with the following function:



 function parse_vardef() { var name = parse_varname(), def; if (is_op("=")) { input.next(); def = parse_expression(); } return { name: name, def: def }; } 


You also need to add a check for a new type of expression to the parse_atom function:



 //      parse_if if (is_kw("let")) return parse_let(); 


Interpreter changes



Since we decided to change the structure of the AST instead of “consuming” it in the old types of nodes, we have to add the processing of new logic to the interpreter.



In order to add support for the optional function name, we change the function make_lambda :



 function make_lambda(env, exp) { if (exp.name) { //  env = env.extend(); //  env.def(exp.name, lambda); //  } //  function lambda() { var names = exp.vars; var scope = env.extend(); for (var i = 0; i < names.length; ++i) scope.def(names[i], i < arguments.length ? arguments[i] : false); return evaluate(exp.body, scope); } return lambda; } 


If the function has a name, then when we create a closure, we make a copy of the context, and add the function to the context. The rest remains the same.



And finally, to handle the let node, we add the following case to the interpreter:



 case "let": exp.vars.forEach(function(v){ var scope = env.extend(); scope.def(v.name, v.def ? evaluate(v.def, env) : false); env = scope; }); return evaluate(exp.body, env); 


Please note that for each variable a new context is created in which a new variable is added. After that, we simply execute the body in the latter context.



Examples



 println(let loop (n = 100) if n > 0 then n + loop(n - 1) else 0); let (x = 2, y = x + 1, z = x + y) println(x + y + z); #   ..     let # print(x + y + z); let (x = 10) { let (x = x * 2, y = x * x) { println(x); ## 20 println(y); ## 400 }; println(x); ## 10 }; 




— .



. , , , . JavaScript λ.



:



 globalEnv.def("fibJS", function fibJS(n){ if (n < 2) return n; return fibJS(n - 1) + fibJS(n - 2); }); globalEnv.def("time", function(fn){ var t1 = Date.now(); var ret = fn(); var t2 = Date.now(); println("Time: " + (t2 - t1) + "ms"); return ret; }); 


time , , , .



 fib = λ(n) if n < 2 then n else fib(n - 1) + fib(n - 2); print("fib(10): "); time( λ() println(fib(10)) ); print("fibJS(10): "); time( λ() println(fibJS(10)) ); println("---"); 


, Google Chrome, n (27), λ , , JS 4 . , , .



λ JavaScript. , for / while ; JS. ? JS , .



, , JavaScript, , JavaScript.



, , . , .



Conclusion



, . , - ; , , ? — JavaScript. , JavaScript — ? , , JavaScript, , , . JavaScript ( , ).



, , Lisp — : //. , , .. Lisp . Lisp let , , Lisp.



: JavaScript. 3: CPS-



')

Source: https://habr.com/ru/post/443812/



All Articles