Writing your own programming language is almost always a bad idea. So why do we need another Lisp? Moreover, there is already ClojureScript, which at the moment is production ready and has a bunch of nice features. To compete even with ClojureScript is insane, - no longer talking about TypeScript, CoffeeScript, etc. But we need the language and not for that!
First of all, we want to deal with the way they are generally written. Plus, it is not necessary to write a "programming" language, often there are tasks to write your own markup language for text, configuration, and so on. All this can be solved in a similar way.
To begin with, let's agree that a programming language is a kind of formalized language, the syntax of which obeys certain rules. To parse these rules, we need to know what kind of "words" (ie, units of language) we can meet. It may not be words in the usual sense, it may be punctuation, numbers, strings, and so on.
For example, in our language such a construction may occur:
(+ 2 3)
It uses at least four "words". Each word is so-called token. And first we have to break our text into these same tokens. The process of breaking our source code into tokens is called tokenization. Until we get into the details, let's move on. Imagine that we broke all the text into tokens, then we need to parse all these tokens in order to get an abstract syntax tree (AST). To do this, we need to write a parser that would be able to handle the resulting sequence of tokens. Again, without going into details, proceed to the next step: compilation / translation. Now we need to convert our AST to either machine code, or to translate into another language. In our case, we will write a translator from AST to JS.
Writing a tokenizer, parser and translator is not such a trivial task, and the need for such tools historically arose very often, so smart people have developed lexer generators and parsers to simplify this task. One such generator is Bison . But since we will write our Lisp on JS, we will use its JS tax - Jison
The Jison parser generator is quite simple to use, for it is beyond the grunt, gulp and webpack plugins that allow all your .jison files to be translated into JS files that already contain a ready-made parser. I will not describe here how you organize a project based on grunt / gulp / webpack. Let's just use the online parser generator http://zaa.ch/jison/try/ .
Let's first analyze an example of a calculator that is on the jison website.
/* description: Parses end evaluates mathematical expressions. */ /* lexical grammar */ %lex %% \s+ {/* skip whitespace */} [0-9]+("."[0-9]+)?\b {return 'NUMBER';} "*" {return '*';} "/" {return '/';} "-" {return '-';} "+" {return '+';} "^" {return '^';} "(" {return '(';} ")" {return ')';} "PI" {return 'PI';} "E" {return 'E';} <<EOF>> {return 'EOF';} /lex /* operator associations and precedence */ %left '+' '-' %left '*' '/' %left '^' %left UMINUS %start expressions %% /* language grammar */ expressions : e EOF {return $1;} ; e : e '+' e {$$ = $1 + $3;} | e '-' e {$$ = $1 - $3;} | e '*' e {$$ = $1 * $3;} | e '/' e {$$ = $1 / $3;} | e '^' e {$$ = Math.pow($1, $3);} | '-' e %prec UMINUS {$$ = -$2;} | '(' e ')' {$$ = $2;} | NUMBER {$$ = Number(yytext);} | E {$$ = Math.E;} | PI {$$ = Math.PI;} ;
In the first part of this file, after the commentary / lexical grammar /, the description of all our "words" of the language is found. Here we can see how the number is determined (yes, you can define tokens through regular expressions, but this is not always good, sometimes it is better to use a combination of simpler tokens to define a high-level token). Here there are also tokens of arithmetic operations: addition, multiplication, subtraction and division.
If you copy the entire text of the grammar into an online generator, then enter the expression 10 + 10, then we get the result of 20. How is that?
Let's look at what this construct defines, for example:
e : e '+' e {$$ = $1 + $3;} | e '-' e {$$ = $1 - $3;} | e '*' e {$$ = $1 * $3;} | e '/' e {$$ = $1 / $3;} | e '^' e {$$ = Math.pow($1, $3);} | '-' e %prec UMINUS {$$ = -$2;} | '(' e ')' {$$ = $2;} | NUMBER {$$ = Number(yytext);} | E {$$ = Math.E;} | PI {$$ = Math.PI;} ;
In fact, everything is very simple, here we say that "e" (expression) can be the sum of two expressions:
: e '+' e {$$ = $1 + $3;} ...
From this example, it is already clear that every our definition of any new syntactic construct can be recursive. In this case, we say that expression can be written like this: expression + expression. Upon encountering such a construction, the parser will token it and then execute the code that is in curly brackets: {$$ = $ 1 + $ 3}. This is the usual JS code. Unusual here only variables with which we operate: $$, $ 1, $ 3.
Let's figure it out. $$ is the result of an expression, that is, what will be transmitted further along the parse tree. $ 1 is the first structural member, for example, the entry "10 + 10", $ 1 is the number 10. $ 3 is the third structural element. The second element would be '+', but since we don’t need it by ourselves, we simply skip it and perform the addition of two variables and then save the result to $$.
This is followed by other patterns of the "e" design. The first pattern is always written through ":", the subsequent ones - through "|", at the end is put ";".
%start expressions %% /* language grammar */ expressions : e EOF {return $1;} ;
Here we say that all parsing will start with pattern'a expressions, which in turn represents any one expression "e" and the end of the file. In curly brackets here we need to return the result of the entire parse. In this case, it will be the result of a single expression.
We figured out the calculator, now let's try to make the expression work:
(+ 1 2)
To do this, let's define our grammar. As I said earlier, there are at least 4 tokens in our expression: "(", "+", "NUMBER" and ")".
Let's describe this in the appropriate part of the grammar file:
/* description: Parses end executes mathematical expressions. */ /* lexical grammar */ %lex %% \s+ /* skip whitespace */ [0-9]+("."[0-9]+)?\b return 'NUMBER' "+" return '+' "(" return '(' ")" return ')' <<EOF>> return 'EOF' /lex %start program %% /* language grammar */ expr : '(' '+' NUMBER NUMBER ')' { $$ = +$3 + +$4 } ; program : expr EOF {return $1;} ;
This is the whole grammar that will allow us to add two numbers, but the addition of two numbers ... This is somehow not enough. And what if we need to add N numbers like in a lisp:
(+ 1 2 3 4 5)
Let's fix our grammar:
/* description: Parses end executes mathematical expressions. */ /* lexical grammar */ %lex %% \s+ /* skip whitespace */ [0-9]+("."[0-9]+)?\b return 'NUMBER' "+" return '+' "(" return '(' ")" return ')' <<EOF>> return 'EOF' /lex %start program %% /* language grammar */ value : NUMBER { $$ = +$1}} ; values : value { $$ = [$1] } | values value { $$ = $1.concat($2)} ; expr : '(' '+' values ')' { $$ = $3.reduce(function(a, b){return a +b }) } ; program : expr EOF {return $1;} ;
Please note that here I first list what we can act as value, thereby defining value, and then recursively defining values:
value : NUMBER { $$ = +$1}} ; values : value { $$ = [$1] } | values value { $$ = $1.concat($2)} ;
That is, values can be one or N consecutively reaching value. At the same time, I say that value in values is an array of one value, and already values are the union of the previous values of the array with each subsequent one.
As a result of parsing values, we get an array of numbers, for which we simply perform a reduce to get the value of the sum of all numbers. Now the operation:
( + 1 2 3 4 5)
will return 15.
Congratulations! We wrote the first and simplest construction of the lisp language. I think that to realize subtraction, multiplication and division is also not difficult for you. Naturally, within the framework of one article it is impossible to describe the development process of all languages, even all basic constructions, but! If you are interested in this topic, you can follow this link: https://github.com/daynin/tiny-lisp
This is a project repository in which Lisp on JS is developed using Jison. A lot of things have already been implemented there, but nevertheless, this is still a very small project. If you want to try yourself in the development of the language, then this project is the best for you!
PS
If you are interested in this topic, then I can turn it into a small cycle of articles, where I would tell and show how it is still possible to bring the language at least to the state in which it is represented by the link above.
Source: https://habr.com/ru/post/281298/
All Articles