Hello! I present to you the second part of my translation of the manual implementing my programming language in JavaScript - the PL Tutorial .
We will create our own programming language - λ language (in the original - λanguage). In the process of creation, we will use a lot of interesting techniques, such as recursive descent, control transfer style, basic optimization techniques. Two versions of the interpreter will be created - the normal and the CPS interpreter, the trans compiler in JavaScript.
The original author is Mihai Bazon , the author of the well-known UglifyJS library (a tool for minimizing and formatting JS code).
PS In the interpreter and compiler there is a bug: in expressions of the type a() && b()
or a() || b()
a() || b()
always executes both parts. This, of course, is wrong because a()
false for the &&
operator, or not false for the operator ||
, then the value of b()
no longer plays any role. It is easy to fix.
In the previous section, we wrote 3 functions: InputStream
, TokenStream
and parse
. To get the AST from the code we use the following code:
var ast = parse(TokenStream(InputStream(code)));
It is easier to write an interpreter than a parser: we simply recursively go through the tree and execute the expressions in their normal order.
For the correct execution of the code, we need a context - an object containing all the variables in a given place. It will be passed as an argument to the evaluate
function.
Every time we enter the lambda
node, we need to add new variables to the context - function arguments. If the argument overlaps a variable from an external block, we must restore the old value of the variable after exiting the function.
The easiest way to do this is to use prototype JavaScript inheritance. When we enter a new function, we create a new context, set the external context as its prototype, and call the function in the new context. Due to this, we have nothing - in the external context all its variables will remain.
Here is the implementation of the Environment
object:
function Environment(parent) { this.vars = Object.create(parent ? parent.vars : null); this.parent = parent; } Environment.prototype = { extend: function() { return new Environment(this); }, lookup: function(name) { var scope = this; while (scope) { if (Object.prototype.hasOwnProperty.call(scope.vars, name)) return scope; scope = scope.parent; } }, get: function(name) { if (name in this.vars) return this.vars[name]; throw new Error("Undefined variable " + name); }, set: function(name, value) { var scope = this.lookup(name); // if (!scope && this.parent) throw new Error("Undefined variable " + name); return (scope || this).vars[name] = value; }, def: function(name, value) { return this.vars[name] = value; } };
The Environment
object has a parent
field that indicates an external context. It will be null
for the global context. It has a vars
field in which all variables belong to the given context. For a global context, it immediately equals an empty object ( Object.create(null)
) and copies of the variables of the parent context ( Object.create(parent.vars)
) for a non-global one.
He has several methods:
extend()
- copy the current context.lookup(name)
- find the context in which the variable named name
defined.get(name)
- get the value of a variable named name
. Throws an exception if the variable has not been defined.set(name, value)
- set the value of the variable. This method looks for the context in which the variable is defined. If it is not defined, and we are not in a global context, an exception will be thrown.def(name, value)
- creates (either overrides or overrides) a variable in the current context.evaluate
functionNow that we have an Environment
object, we can move on to solving the main problem. This function will be a large switch
block that will do some kind of action depending on the type of node transferred:
function evaluate(exp, env) { switch (exp.type) {
For literals, we simply return its value:
case "num": case "str": case "bool": return exp.value;
Variables are taken from the context (the name of the variable is contained in the value
field):
case "var": return env.get(exp.value);
For assignment, we need to make sure that on the left side we have the name of the variable (the var
node). If not, we just throw an exception (we do not support the assignment of anything other than variables). Next, we set the value of the variable using env.set
. Note that the right side of the expression must be calculated using the recursive call of evaluate
:
case "assign": if (exp.left.type != "var") throw new Error("Cannot assign to " + JSON.stringify(exp.left)); return env.set(exp.left.value, evaluate(exp.right, env));
For a node of type binary
we must apply an operator to both operands. We will write the apply_op
function later. Also, we call evaluate
for both parts of the expression:
case "binary": return apply_op(exp.operator, evaluate(exp.left, env), evaluate(exp.right, env));
A lambda
node will return a regular javascript closure, so you can call it as a normal function, even from javascript. I added the function make_lambda
, which I will show later:
case "lambda": return make_lambda(env, exp);
The execution of the if
node if
quite simple: first we find the value of the condition. If it is not false, then return the value of the then
branch. Otherwise, if there is an else
branch, then its value, or false
:
case "if": var cond = evaluate(exp.cond, env); if (cond !== false) return evaluate(exp.then, env); return exp.else ? evaluate(exp.else, env) : false;
The prog
node is a sequence of expressions. We simply execute all the expressions in order and take the value of the last (the value of the empty sequence is false
):
case "prog": var val = false; exp.prog.forEach(function(exp){ val = evaluate(exp, env) }); return val;
For a call
node, we need to call a function. Before this we find the value of the function itself, find the values of all the arguments and call the function using apply
:
case "call": var func = evaluate(exp.func, env); return func.apply(null, exp.args.map(function(arg){ return evaluate(arg, env); }));
We will never get here, but in case we add a new node type to the parser and forget to add it to the interpreter:
default: throw new Error("I don't know how to evaluate " + exp.type); } }
This was the main part of the interpreter. Above, we used two functions that we have not yet implemented, so let's start:
apply_op
: function apply_op(op, a, b) { function num(x) { if (typeof x != "number") throw new Error("Expected number but got " + x); return x; } function div(x) { if (num(x) == 0) throw new Error("Divide by zero"); return x; } switch (op) { case "+" : return num(a) + num(b); case "-" : return num(a) - num(b); case "*" : return num(a) * num(b); case "/" : return num(a) / div(b); case "%" : return num(a) % div(b); case "&&" : return a !== false && b; case "||" : return a !== false ? a : b; case "<" : return num(a) < num(b); case ">" : return num(a) > num(b); case "<=" : return num(a) <= num(b); case ">=" : return num(a) >= num(b); case "==" : return a === b; case "!=" : return a !== b; } throw new Error("Can't apply operator " + op); }
It gets the operator type and arguments. Simple and clear switch
. Unlike JavaScript, which can take any values, like variables, even those that have no meaning. We demand that the operands of arithmetic operators be numbers and do not allow division by zero. For the strings we will come up with something later.
make_lambda
: function make_lambda(env, exp) { function lambda() { var names = exp.vars; var scope = env.extend(); for (var i = 0; i < names.length; ++i) scope.def(names[i], i < arguments.length ? arguments[i] : false); return evaluate(exp.body, scope); } return lambda; }
As you can see above, it returns a regular JavaScript function that uses the passed context and the AST function. All work is done only when the function itself is called: the context is created, the arguments are set (if they are not enough, they become false
). Then the body of the function is simply executed in a new context.
As you can see, we had no way to interact with JavaScript from our language. I used to use the print
and print
functions, but I never defined them anywhere. We need to write them in javascript and just add to the global context.
Here is an example of such a code:
// - var code = "sum = lambda(x, y) x + y; print(sum(2, 3));"; // , parse TokenStream, InputStream var ast = parse(TokenStream(InputStream(code))); // var globalEnv = new Environment(); // "" "print" globalEnv.def("print", function(txt){ console.log(txt); }); // evaluate(ast, globalEnv); // 5
You can download all the code that we have written for all this time. It can be run using NodeJS. Just pass the code to the standard stream:
echo 'sum = lambda(x, y) x + y; println(sum(2, 3));' | node lambda-eval1.js
Our programming language, though simple, can (theoretically) solve any problem that can be solved by a computer at all. This is because some dudes, smarter than I ever will be - Alonzo Church and Alan Turing - have proved in their time that λ-calculus (lambda calculus) are equivalent to the Turing machine, and our λ-language implements λ-calculus.
This means that even if there are no opportunities in our language, we will be able to realize them all the same using what we already have. Or, if this is difficult to do, we can write an interpreter for another language in this.
Loops are not a problem if we have a recursion. I already showed an example of a loop implemented on top of recursion. Let's try it again.
print_range = λ(a, b) if a <= b { print(a); if a + 1 <= b { print(", "); print_range(a + 1, b); } else println(""); }; print_range(1, 10);
But here we have a problem: if we increase the number of iterations, say, to 1000. I, for example, get the error “Maximum call stack size exceeded” after 600. This happens because the interpreter is recursive and the recursion reaches the maximum depth.
This is a serious problem, but there is a solution. I'd like to add new constructions for iteration ( for
or while
), but let's try to do without them. Recursion looks beautiful, so let's leave it. We will later see how to get around this limitation.
Our λ language has three data types: numbers, strings, and a boolean type. It may seem impossible to create complex types, such as arrays, or objects. But this is not tat, we have another type: function. It turns out that if we follow λ-calculus, then we can create any data structures, including objects, even with inheritance.
I will show it on the example of lists. Let's imagine that we have a cons
function that creates an object containing two values. Let's call this object a cell or a pair. Let's call one of the values "car" and the other one - "cdr". Just because they are so called in Lisp. Now, if we have a "cell" object, then we can get its values using the car
and cdr
functions:
x = cons(10, 20); print(car(x)); # 10 print(cdr(x)); # 20
Now we can easily define the list:
The list is a pair containing the first item in `car` and the rest items in` cdr`. But `cdr` can contain only one value! This value will be a list. The list is a pair containing the first item in `car` and the rest items in` cdr`. But `cdr` can contain only one value! This value will be a list. [...]
This results in a recursive data type. But one problem remains: when to stop? Logically, we should stop when cdr
is an empty list. But what is an "empty list"? To do this, let's add a new object called "NIL". It can be used as a pair (we can use car
and cdr
on it, but the result will be NIL
itself). Now let's create a list of items 1, 2, 3, 4, 5:
x = cons(1, cons(2, cons(3, cons(4, cons(5, NIL))))); print(car(x)); # 1 print(car(cdr(x))); # 2 Lisp . cadr print(car(cdr(cdr(x)))); # 3 caddr print(car(cdr(cdr(cdr(x))))); # 4 cadddr print(car(cdr(cdr(cdr(cdr(x)))))); # 5 .
It looks terrible when there is no special syntax for it. But I just wanted to show that this can be done using the existing capabilities of λzka. Here is the implementation:
cons = λ(a, b) λ(f) f(a, b); car = λ(cell) cell(λ(a, b) a); cdr = λ(cell) cell(λ(a, b) b); NIL = λ(f) f(NIL, NIL);
When I first saw cons
/ car
/ cdr
made in this way, I was surprised that they didn’t need any if
(but this is strange given the fact that it is not in the original λ-calculus). Of course, no programming language does this in this way, because it is extremely inefficient, but it does not make λ-calculus less beautiful. In plain language, this code does the following:
cons
function takes two values ( a
and b
) and returns the function that holds them. That function is the very object of the pair. It takes an argument and calls it for both values of the pair.car
function calls the passed argument, passing in a function that returns the first argument.cdr
function does the same thing as the car
function, but with the only difference that the passed function returns the second argument.NIL
function works the same as cons
, but returns a pair with two values equal to NIL. cons = λ(a, b) λ(f) f(a, b); car = λ(cell) cell(λ(a, b) a); cdr = λ(cell) cell(λ(a, b) b); NIL = λ(f) f(NIL, NIL); x = cons(1, cons(2, cons(3, cons(4, cons(5, NIL))))); println(car(x)); # 1 println(car(cdr(x))); # 2 println(car(cdr(cdr(x)))); # 3 println(car(cdr(cdr(cdr(x))))); # 4 println(car(cdr(cdr(cdr(cdr(x)))))); # 5
There are many algorithms on lists that can be implemented recursively and look logical. For example, here is the function that calls the passed function for each list item:
foreach = λ(list, f) if list != NIL { f(car(list)); foreach(cdr(list), f); }; foreach(x, println);
And here is another one that creates a list for the range of numbers:
range = λ(a, b) if a <= b then cons(a, range(a + 1, b)) else NIL; # 1 8 foreach(range(1, 8), λ(x) println(x * x));
The lists that we have implemented above are immutable (we cannot change car
or cdr
after the list has been created). Most Lisp's have functions to change a pair. In Scheme, they are called set-car!
/ set-cdr!
. In Common Lisp - rplaca
/ rplacd
. This time we use the names from Scheme:
cons = λ(x, y) λ(a, i, v) if a == "get" then if i == 0 then x else y else if i == 0 then x = v else y = v; car = λ(cell) cell("get", 0); cdr = λ(cell) cell("get", 1); set-car! = λ(cell, val) cell("set", 0, val); set-cdr! = λ(cell, val) cell("set", 1, val); # NIL NIL = cons(0, 0); set-car!(NIL, NIL); set-cdr!(NIL, NIL); ## : x = cons(1, 2); println(car(x)); # 1 println(cdr(x)); # 2 set-car!(x, 10); set-cdr!(x, 20); println(car(x)); # 10 println(cdr(x)); # 20
This shows that we can implement mutable data structures. I will not delve into how it works, it is clear from the code.
We can go further and implement the objects, but without changes in the syntax it will be difficult to do. Another way is to implement a new syntax in the tokenizer / parser and add them to the interpreter. This is done by all major programming languages, and it is necessary to achieve normal performance. We will add a new syntax in the next part of the article.
[From the translator: if you are interested in lambda calculus, on Habré there is a cool article on this topic: Lambda calculus in JavaScript .]
Our λ language has quite a few syntactic constructs. For example, there is no direct way to add new variables. As I said before, we must use IIFE, so it looks like this:
(λ(x, y){ (λ(z){ ## it gets worse when one of the vars depends on another print(x + y + z); })(x + y); })(2, 3);
We will add the let
keyword. This will allow us to write something like this:
let (x = 2, y = 3, z = x + y) print(x + y + z);
For each variable in the let
block, previous variables should be available even from the same block. Therefore, the code above will be equivalent to the following:
(λ(x){ (λ(y){ (λ(z){ print(x + y + z); })(x + y); })(3); })(2);
These changes can be made directly in the parser and then will not require changes in the interpreter. Instead of adding a new let
node, we can turn it into call
and lambda
nodes. This means that we have not made any semantic changes in our language - this is called “syntactic sugar”, and the operation of turning it into AST nodes that existed before is called “saccharification” (the original: “desugaring”).
However, we have to change the parser anyway. Let's add a new "let" node because it can be interpreted more efficiently (no need to create closures and call them right away, we just need to copy and change the context).
Also, we will add support for the "let named" that was in Scheme. It makes it easier to create loops:
print(let loop (n = 10) if n > 0 then n + loop(n - 1) else 0);
This is a "recursive" cycle, which counts the sum of 10 + 9 + ... + 0. Previously, we would have to do it like this:
print((λ(loop){ loop = λ(n) if n > 0 then n + loop(n - 1) else 0; loop(10); })());
Also, to make it easier, we will add the syntax of "functions with a name." It will look like this:
print((λ loop (n) if n > 0 then n + loop(n - 1) else 0) (10));
Modifications that need to be done for this:
lambda
keyword. If it is present, then we must add a variable in the current context that will point to the function itself. This is exactly the same as the function with the name in JavaScript.let
keyword. Next comes an optional name and a list (possibly empty) of variable definitions in the following form: foo = EXPRESSION
, separated by commas. The body of a let
expression is one expression (which, of course, can be a sequence of expressions).First, a small change in the tokenizer, add the let
keyword to the keyword list:
var keywords = " let if then else lambda λ true false ";
Change the parse_lambda
function to read the optional name:
function parse_lambda() { return { type: "lambda", name: input.peek().type == "var" ? input.next().value : null, // vars: delimited("(", ")", ",", parse_varname), body: parse_expression() }; }
Now add a function that parses the let
expression:
function parse_let() { skip_kw("let"); if (input.peek().type == "var") { var name = input.next().value; var defs = delimited("(", ")", ",", parse_vardef); return { type: "call", func: { type: "lambda", name: name, vars: defs.map(function(def){ return def.name }), body: parse_expression(), }, args: defs.map(function(def){ return def.def || FALSE }) }; } return { type: "let", vars: delimited("(", ")", ",", parse_vardef), body: parse_expression(), }; }
This handles two cases. If after let
there is a token of type var
, then this is let
with the name. Next, we read the list of definitions using the delimited
function, since they are in brackets and separated by commas, and use the parse_vardef
function, which is shown below. Next we return the call
node, which immediately calls the function with the name (IIFE). The arguments to the function are the variables defined in let
, and the call
node will pass the values as arguments. And, of course, the function body is read using parse_expression()
.
If this is a simple let
, then we return a let
type node with the vars
and body
fields. The vars
field contains an array of variables in the following format: { name: VARIABLE, def: AST }
, which are parsed with the following function:
function parse_vardef() { var name = parse_varname(), def; if (is_op("=")) { input.next(); def = parse_expression(); } return { name: name, def: def }; }
You also need to add a check for a new type of expression to the parse_atom
function:
// parse_if if (is_kw("let")) return parse_let();
Since we decided to change the structure of the AST instead of “consuming” it in the old types of nodes, we have to add the processing of new logic to the interpreter.
In order to add support for the optional function name, we change the function make_lambda
:
function make_lambda(env, exp) { if (exp.name) { // env = env.extend(); // env.def(exp.name, lambda); // } // function lambda() { var names = exp.vars; var scope = env.extend(); for (var i = 0; i < names.length; ++i) scope.def(names[i], i < arguments.length ? arguments[i] : false); return evaluate(exp.body, scope); } return lambda; }
If the function has a name, then when we create a closure, we make a copy of the context, and add the function to the context. The rest remains the same.
And finally, to handle the let
node, we add the following case to the interpreter:
case "let": exp.vars.forEach(function(v){ var scope = env.extend(); scope.def(v.name, v.def ? evaluate(v.def, env) : false); env = scope; }); return evaluate(exp.body, env);
Please note that for each variable a new context is created in which a new variable is added. After that, we simply execute the body in the latter context.
println(let loop (n = 100) if n > 0 then n + loop(n - 1) else 0); let (x = 2, y = x + 1, z = x + y) println(x + y + z); # .. let # print(x + y + z); let (x = 10) { let (x = x * 2, y = x * x) { println(x); ## 20 println(y); ## 400 }; println(x); ## 10 };
— .
:
globalEnv.def("fibJS", function fibJS(n){ if (n < 2) return n; return fibJS(n - 1) + fibJS(n - 2); }); globalEnv.def("time", function(fn){ var t1 = Date.now(); var ret = fn(); var t2 = Date.now(); println("Time: " + (t2 - t1) + "ms"); return ret; });
time
, , , .
fib = λ(n) if n < 2 then n else fib(n - 1) + fib(n - 2); print("fib(10): "); time( λ() println(fib(10)) ); print("fibJS(10): "); time( λ() println(fibJS(10)) ); println("---");
, Google Chrome, n (27), λ , , JS 4 . , , .
λ JavaScript. , for
/ while
; JS. ? JS , .
, , JavaScript, , JavaScript.
, , . , .
, . , - ; , , ? — JavaScript. , JavaScript — ? , , JavaScript, , , . JavaScript ( , ).
, , Lisp — : //. , , .. Lisp . Lisp let
, , Lisp.
Source: https://habr.com/ru/post/443812/