How to be * a compiler - creating a JavaScript compiler

Hi, Habr! I present to your attention the translation of the article " How to be * a compiler - make a compiler with JavaScript " by Mariko Kosaka.

* That's right! Being a compiler is great!

It was one wonderful Sunday afternoon in Bushwick, Brooklyn. In my local bookstore, I stumbled upon John Maed’s “Design by Numbers” . It was a step-by-step instruction on the study of DBN, a programming language created in the late 90s at the MIT Media Lab for the visual presentation of computer programming concepts.

The three lines of code in the first figure draw a black line on white paper (an example is taken from here ). To draw more complex shapes, such as, for example, a square, you just need to draw more lines (the second figure).
')

I immediately thought that in 2016, this could be an interesting project - to create SVG from a DBN without installing Java to execute the source DBN code.

I decided that for this I need to write a compiler from DBN to SVG, so my compiler writing quest started. Creating a compiler sounds like a fairly complex scientific process, but I have never even written a graph traversal for an interview ... Will I be able to write a compiler?

Let's first try to become a compiler ourselves.

A compiler is a mechanism that takes a piece of code and converts it into something else. Let's compile a simple DBN code into a physical drawing.

Take three DBN commands: Paper sets the paper color, Pen sets the brush color, and Line draws a line. Color 100 is equivalent to 100% black, which is rgb (0%, 0%, 0%) in CSS. Images created in the DBN are always in grayscale. In DBN paper is always 100 × 100, the line width is always 1, and the line itself is given by (x, y) coordinates, the counting is from the bottom left corner of the image.

Let's stop at this and try it yourself to be a compiler. Take a paper, a pen and try to compile the following code into a drawing.

Paper 0 Pen 100 Line 0 50 100 50

Have you drawn a horizontal line from the left edge of the sheet to the right, and it is in the center of the vertical? Congratulations! You have just become a compiler.

How does the compiler work?

Let's see what was going on in our head while we were a compiler.

1. Lexical Analysis (Tokenization)

The first thing we did was break the source code into words by spaces. In the process, we conditionally defined primitive types for each token, such as “word” or “number”.

2. Parsing

As soon as we broke a piece of text into tokens, we went through each of them and tried to find the relationship between them.

In this case, we have grouped together a set of numbers and a word referring to them. Having done this, we began to distinguish certain code structures.

3. Transformation

After parsing, we transformed the resulting structures into something more suitable for the final result. In our case, we are going to draw an image, which means we need to transform our structures into step-by-step instructions that are understandable to humans.

4. Code Generation

At this stage, we simply follow the instructions we made in the previous step of preparing for drawing.

This is what the compiler does!

The figure that we made is the compiled result (similar to the .exe file that is created during the compilation of the C program). We can send this drawing to any person or to any device (scanner, camera, etc.) and everyone recognizes the black line in the center of the sheet.

Let's write a compiler

Now that we know how the compiler works, let's write another one, but using JavaScript. This compiler will take the DBN code and convert it to SVG.

1. Lexer function

In the same way as we can divide the sentence “I have a handle” into words [I, I have a handle], the lexical analyzer can break the code presented in the form of a string into certain meaningful parts (tokens). In a DBN, all tokens are separated by spaces and are classified as “word” or “number”.

 function lexer (code) { return code.split(/\s+/) .filter(function (t) { return t.length > 0 }) .map(function (t) { return isNaN(t) ? {type: 'word', value: t} : {type: 'number', value: t} }) }

 input: "Paper 100" output:[ { type: "word", value: "Paper" }, { type: "number", value: 100 } ]

2. Parser function

The parser passes through each token, collects syntax information and builds a so-called abstract syntax tree. You can view AST as a map of our code - a way to see how it is structured.

In our code, two syntactic types are “NumberLiteral” and “CallExpression”. NumberLiteral means the value is a number. This number is used as an argument for CallExpression.

 function parser (tokens) { var AST = { type: 'Drawing', body: [] } //    while (tokens.length > 0){ var current_token = tokens.shift() //          , //     ,    if (current_token.type === 'word') { switch (current_token.value) { case 'Paper' : var expression = { type: 'CallExpression', name: 'Paper', arguments: [] } //     CallExpression  Paper, //       var argument = tokens.shift() if(argument.type === 'number') { expression.arguments.push({ //        type: 'NumberLiteral', value: argument.value }) AST.body.push(expression) //      } else { throw 'Paper command must be followed by a number.' } break case 'Pen' : ... case 'Line': ... } } } return AST }

 input: [ { type: "word", value: "Paper" }, { type: "number", value: 100 } ] output: { "type": "Drawing", "body": [{ "type": "CallExpression", "name": "Paper", "arguments": [{ "type": "NumberLiteral", "value": "100" }] }] }

3. Transformer function

The abstract syntax tree (AST) that we created in the previous step describes well what happens in the code, but we still cannot create from this SVG.

For example, the “Paper” command is understandable only to code written on a DBN. In SVG, we would like to use a <rect> element to represent paper, so we need a function that converts our AST to another AST, more suitable for SVG.

 function transformer (ast) { var svg_ast = { tag : 'svg', attr: { width: 100, height: 100, viewBox: '0 0 100 100', xmlns: 'http://www.w3.org/2000/svg', version: '1.1' }, body:[] } var pen_color = 100 //    -  //    while (ast.body.length > 0) { var node = ast.body.shift() switch (node.name) { case 'Paper' : var paper_color = 100 - node.arguments[0].value svg_ast.body.push({ //    rect    svg_ast tag : 'rect', attr : { x: 0, y: 0, width: 100, height:100, fill: 'rgb(' + paper_color + '%,' + paper_color + '%,' + paper_color + '%)' } }) break case 'Pen': pen_color = 100 - node.arguments[0].value //       `pen_color` break case 'Line': ... } } return svg_ast }

 input: { "type": "Drawing", "body": [{ "type": "CallExpression", "name": "Paper", "arguments": [{ "type": "NumberLiteral", "value": "100" }] }] } output: { "tag": "svg", "attr": { "width": 100, "height": 100, "viewBox": "0 0 100 100", "xmlns": "http://www.w3.org/2000/svg", "version": "1.1" }, "body": [{ "tag": "rect", "attr": { "x": 0, "y": 0, "width": 100, "height": 100, "fill": "rgb(0%, 0%, 0%)" } }] }

4. Generator function

In the last step of the compiler, a function is called that builds the SVG code based on the new AST that we created in the previous step.

 function generator (svg_ast) { //      // { "width": 100, "height": 100 }   'width="100" height="100"' function createAttrString (attr) { return Object.keys(attr).map(function (key){ return key + '="' + attr[key] + '"' }).join(' ') } //     <svg>.      svg  var svg_attr = createAttrString(svg_ast.attr) //      svg_ast  svg  var elements = svg_ast.body.map(function (node) { return '<' + node.tag + ' ' + createAttrString(node.attr) + '></' + node.tag + '>' }).join('\n\t') //      svg    SVG  return '<svg '+ svg_attr +'>\n' + elements + '\n</svg>' }

 input: { "tag": "svg", "attr": { "width": 100, "height": 100, "viewBox": "0 0 100 100", "xmlns": "http://www.w3.org/2000/svg", "version": "1.1" }, "body": [{ "tag": "rect", "attr": { "x": 0, "y": 0, "width": 100, "height": 100, "fill": "rgb(0%, 0%, 0%)" } }] } output: <svg width="100" height="100" viewBox="0 0 100 100" version="1.1" xmlns="http://www.w3.org/2000/svg"> <rect x="0" y="0" width="100" height="100" fill="rgb(0%, 0%, 0%)"> </rect> </svg>

5. Putting it all together

Let's call our compiler “sbn compiler”. Create an sbn object with our lexer, parser, transformer and generator. Add a “compile” method that will invoke a chain of 4 of these methods.

Now we can pass the line of code to the compilation method and get the SVG.

 var sbn = {} sbn.VERSION = '0.0.1' sbn.lexer = lexer sbn.parser = parser sbn.transformer = transformer sbn.generator = generator sbn.compile = function (code) { return this.generator(this.transformer(this.parser(this.lexer(code)))) } //  sbn  var code = 'Paper 0 Pen 100 Line 0 50 100 50' var svg = sbn.compile(code) document.body.innerHTML = svg

I made an interactive demo in which you can see the result of the compiler at each of their steps. The code for the sbn compiler can be downloaded on github . At the moment I am working on expanding the functionality of the compiler. If you want to see only the basic compiler, actually the one that was created in this article, you can find it here .

Should the compiler use recursion, traversal, etc.?

Yes, all these techniques are certainly perfect for creating a compiler, but this does not mean that you should immediately apply them in your compiler.

I started writing my compiler using only a small set of DBN commands. Gradually, I began to complicate the functionality, and now I'm going to add to the compiler the use of variables, blocks and loops. It is certainly good to have all these constructions, but it is not necessary to apply them from the very beginning.

Writing compilers is great

What can you do if you can write a compiler? Maybe you want to write your new version of JavaScript in Spanish ... How about EspañolScript?

 // ES (español script) función () { si (verdadero) { return «¡Hola!» } }

There are already those who have written their language using emoji (Emojicode) or color images (Piet) . The possibilities are endless!

Training in the process of creating a compiler

Creating a compiler was fun, and I learned a lot about software development. I will list just a few things that I learned in the process of writing a compiler.

1. It's okay to not know something.

Like our lexical analyzer, you don't need to know everything from the very beginning. If you do not quite understand some of the code or technology, it is normal to move the work with this to the next step. Do not be nervous, sooner or later you will understand this!

2. Pay attention to the text of your error messages.

The role of the parser is to strictly follow the instructions and check whether everything is written as stated in the rules. Yes, mistakes happen often. When this happens, try sending an informative, friendly error message. It's easy to say - “It doesn't work that way” (“ILLEGAL Token” or “undefined is not a function” in JavaScript), but try to provide the user with as much useful information as possible instead.

This also applies to command communication. When someone is stuck with his tasking, instead of saying “Yes it doesn’t work,” you can start saying “I would search google for such keywords as ...” or “I recommend reading such and such documentation”. You do not need to take on the work of another person, but you can definitely help him to do his job better and faster just by throwing a fresh idea to him.

The Elm programming language uses this approach to display error messages, where the user is offered options for solving his problem (“Maybe you want to try this?”).

3. Our context is everything

Finally, just as our transformer transformed one type of AST into another, more suitable for the final result, everything in programming depends on the context.

There is not one perfectly perfect solution. Do not do something just because it’s fashionable now, or because you’ve done it before, think first about the context of the task. Things that work for one user can be completely terrible for another.

Therefore, appreciate the work of “transformers”, you quite possibly know this in your team, a person who completes well a job started by someone, or does refactoring. It essentially does not create a new code, but the result of its work is damn important for the final quality product.

I hope you enjoyed this article and that I convinced you how great it is to write compilers and be a compiler yourself!

This is a translation of the article by Mariko Kosaka , which is part of her speech at JSConf Colombia 2016 in Medellin, Colombia. If you want to know more about this, you can find the slides here and the original article here .

Source: https://habr.com/ru/post/350612/

All Articles