📜 ⬆️ ⬇️

Writing a simple Lisp translator - I

Let's try to write in Lisp ... a translator of a simple imperative language. No, no, I was not mistaken - it was the translator. It will be broadcast in Lisp code. And then this code can be executed by a Lisp system.


Here, an invaluable service will be provided to us by the fact that in Lisp there is no barrier between code and data (this rare property of some programming languages ​​is called “homoiconnost”). But the visual capabilities of Lisp will also play a significant role.


As an implementation, I will use HomeLisp . Those interested will be able to adapt this project under Common Lisp. I have to say right away - with reference to the task in question, the essential difference between Common Lisp and HomeLisp is only the processing of strings and files.


Download the portable version of HomeLisp by the link . Documentation is located on the same site. Those interested can copy the code from the article and immediately check the performance.


The topic offered to your attention served as the basis for my workshop at the famous Novosibirsk LSUP-2018 . The results of the workshop can be found here . And then I set out my approach. I assume that the reader is familiar with the language of Lisp.


Getting started


Let's start with “simple imperative language”, which we will broadcast in Lisp.
The language will only process numeric data. The code in this language consists of functions (procedures that return values). Among these functions, one should be called main. It is from this function that code execution begins. But why bind yourself like that? We write functions in an imperative language; they are translated to Lisp and can be used together with Lisp functions. But let's not get ahead of ourselves ...


The set of language operators is common: assignment, branching, arithmetic cycle, early exit from the cycle, input, output, and function call. However, syntactically, the function call is issued as an assignment (result of the call). Comments let contain an asterisk in the first position of the line. Language, of course, should provide the ability to create recursive functions. To make it clearer, I will give examples of the code - printing consecutive odd numbers and calculating their sum:


proc main() local s,n,k input n for i=1 to n k=2*i-1 print k s=s+k end_for print s end_proc 

In its spirit, it is a basic-like language. I will call it “mini-basic”. Our translator should convert the given code into the following Lisp function:


 (defun main nil (let ((s 0) (n 0) (k 0)) (setq n (read)) (iter (for i from 1 to n) (setq k (- (* 2 i) 1)) (printline k) (setq s (+ sk))) (printline s))) 

I really like the iterate package, which is implemented as a macro in Common Lisp professional packages. In HomeLisp, the iter function (which implements a significant part of the macro iterate capability) is included in the core language. My addiction to iter was the reason that our “mini-basic” cycles would translate into calls to iter.


How to start the implementation? Let's start by selecting the file to be translated and reading it line by line and printing it. We will have to launch the translator many times, so let this launch be convenient from the very beginning. Here’s what this function might look like:


 (defun start (&optional (fname nil)) (setq *numline* 0) (setq *flagerr* nil) (when (null fname) (setq fname (sysGetOpenName (sysHome) "-|*.mbs"))) (let ((fi (gensym 'fi))) (when fname (filOpen fi fname _INPUT) (loop (getLine fi) (when (or *flagerr* (filEOF fi)) (return t))) (filClose fi) (when *flagerr* (printsline "****   ")))) (unset '*numline*) (unset '*flagerr*)) 

The function has an optional parameter fname - the name of the file, the contents of which will be translated. When entering the function, two global variables numLine are created, the line number of the source file and flagerr the error status flag. Before the end of the function, these variables are destroyed (the HomeLisp function unset destroys global variables).


If the input file name is omitted, then the standard windows file selection dialog (sysGetOpenName) is called. The current directory (sysHome) is used as the starting directory. Next, a unique character for the file handler is created and the file is opened for text reading. Then, in an infinite loop, the file is read line by line ( getLine function). After each operation, it will be checked whether an error has occurred and whether the end of the file has not been reached. If an error occurs or the end of the file is fixed, the loop is broken, the file is closed, and if there were errors, the final message is displayed.
Actually reading from the file is performed by the function getLine :


 (defun getLine (fil) (let ((stri "")) (loop (when (filEof fil) (return "")) (setq *numline* (add1 *numline*)) (setq stri (filGetline fil)) (printsline (strCat (format *numline* "0000") " " (strRTrim stri))) (setq stri (strATrim stri)) (unless (or (eq "" stri) (eq "*" (strLeft stri 1))) (return stri))))) 

This function accepts the identifier of the open file and in the infinite loop performs the following actions:



Thus, in the output listing all the lines of the file are in their original form.


We break into procedures


And now let's teach our code to break the input stream into separate procedures. First, the entered string will need to be divided into lexemes (indivisible input lexical units). This process is called parsing; we have to create a parser. Writing parsers is a classic theme, there are libraries of ready-made parsers and special tools that allow you to generate the necessary parser ... We will go our own way.


Before describing the parser algorithm, note that all the characters in the input string can be divided into two classes:



Thus, in the assignment operator “x = 15 + y ^ 2”, the symbols x, 1.5, y, and 2 are the usual characters, and the characters “space” , + , ^ are the separators. How does a regular character differ from a separator? Separator - always separates one token from another. Our assignment operator, being broken into lexemes, looks like this: “x”, “=”, “15”, “y”, “^”, “2” .


As you can see, not all separators fall into the result of parsing (spaces, in particular, do not fall). We will call separators that do not fall into the result, separators of the first type. The remaining delimiters will be called second-type delimiters.


The input of the parser is a string, the output is a list of string tokens. As the drive will be used local variable - the battery. Initially, the battery contains an empty string.


The parsing algorithm can be as follows: read character by character input string. If a normal character is encountered, concatenate it with a battery. If there is a delimiter, then:



Here is the parser code:


 (defun parser (txt &optional (d1 " ,") (d2 "()+-*/\^=<>%")) (let ((res nil) (lex "") ) (iter (for s in-string (strCat txt (strLeft d1 1))) (cond ((plusp (strInd d1 s)) (when (> (strLen lex) 0) (collecting lex into res)) (setq lex "")) ((plusp (strInd d2 s)) (when (> (strLen lex) 0) (collecting lex into res)) (collecting s into res) (setq lex "")) (t (setq lex (strCat lex s))))) res)) 

In addition to the required parameter, the function has two optional parameters: d1 contains a string, each character of which has a separator of the first type, and the string d2 contains separators of the second type.


The program logic of the parser function is described above. It should only be noted that before starting work, a separator is added to the end of the input line. This is done to ensure that the last lexeme to be processed is “stuck” in the battery (the local variable lex plays the role of the battery).


Let's check our parser “in action”:


 (parser "x = 15 + y^2") ==> ("x" "=" "15" "+" "y" "^" "2") 

That's right, isn't it? But working with lists of strings is not quite lispovsky. Let's move from the lists of lines to the list of atoms. To do this, we need a function that ... glues all the lexemes again into a long line (but inserts a space between the lexemes), then attaches the opening bracket to the beginning of this line, closes the end bracket to the end ... and then forces Lisp to read the list:


 (defun mk-intf (txt) (let ((lex (parser txt " ," "()+-*/\^=<>%")) (intf "")) (iter (for a in lex) (setq intf (strCat intf a " "))) (input (strCat "(" intf ")")))) 

Now, if we give the assignment operator to the function mk-intf, we get:


 (mk-intf "x = 15 + y^2") ==> (X = 15 + Y ^ 2) 

That, you see, is much nicer.


Now let's change the start function a little: this function will have to read and process entire procedures:


 (defun start (&optional (fname nil)) (setq *numline* 0) (setq *flagerr* nil) (when (null fname) (setq fname (sysGetOpenName (sysHome) "-|*.mbs"))) (when fname (let ((fi (gensym 'fi))) (filOpen fi fname _INPUT) (loop (let ((curr-proc (action-proc fi))) (when (or *flagerr* (filEOF fi)) (return t)) (eval curr-proc))) (filClose fi)) (when *flagerr* (printsline "****   "))) (unset '*numline*) (unset '*flagerr*)) 

In the loop body, the action-proc function is called (to process the procedure), which will form the body of the received procedure already on Lisp. The body of the procedure, stored as an S-expression in the variable curr-proc , is then passed to the input of eval . And the adopted function is “reincarnated” in Lisp environment!


What should action-proc do? This function gets the identifier of the open file as a parameter. The function reads from file line by line, skips blank lines and comments, parses the remaining lines, translates into a list form, and generates a procedure body.


We will gradually “learn” the generation of action-proc . And let's start by teaching our function to recognize the beginning and end of the procedure. In the mini-BASIC, the beginning of the procedure looks like:


 proc name(p1,p2,p3) 

Try parsing this line:


 (mk-intf "proc name(p1,p2,p3)") ==> (PROC NAME (P1 P2 P3)) 

How should the action-proc function respond to this input? It is quite natural: after making sure that the head of the list is an atom of PROC , you need to take the second element of the list as the name of the function, and the third element as the list of parameters. The name and parameter list should be stored in local variables. When the end_proc operator is read , it is necessary to form the form defun with an empty (for now) body from the name of the function and the list of parameters, and return this form as a result. Here's what it looks like:


 (defun action-proc (fi) (let ((stmt nil) (proc-name nil) (proc-parm nil)) (loop (setq stmt (mk-intf (getLine fi))) (when (null stmt) (return t)) (cond ((eq (car stmt) 'proc) (setq proc-name (nth 1 stmt)) (setq proc-parm (nth 2 stmt))) ((eq (car stmt) 'end_proc) (return t)) (t (printsline (strCat "****  " (output stmt) "  ")) (setq *flagerr* t)))) `(defun ,proc-name ,proc-parm (quote OK)))) 

For the final generation of the defun clause, reverse blocking is used. Notice that the generated procedure will issue an OK atom as the result.


Now we can check our code in action. Fill in the file 0000.mbs the following code:


 proc f1(x,y) end_proc proc f2(x) end_proc 

Run the start procedure, select 0000.mbs and see on the console:


 0001 proc f1(x,y) 0002 end_proc 0003 proc f2(x) 0004 end_proc 

If you wish, you can make sure that the Lisp machine now has two (so far useless) functions f1 and f2 :


 (getd 'f1) ==> (EXPR (XY) (QUOTE OK)) (getd 'f2) ==> (EXPR (X) (QUOTE OK)) 

Moreover! They can already be run:


 (f1 1 2) ==> OK (f2 2) ==> OK 

Our translator took the first breath ...


Input, output and local variables


Now is the time to teach our newborn translator to handle input , print and local operators.


The easiest way to handle input and print. Both operators have the same syntactic structure: a keyword and a variable. The input x statement should turn into the Lisp form (setq x (read)) . Accordingly, the print x operator turns into a form (printline x) . To store these forms, it is necessary to provide a local variable body in the action-proc function. In this variable, the forms that calculate the future function will be accumulated. Then everything is pretty simple:


 (defun action-proc (fi) (let ((stmt nil) (proc-name nil) (proc-parm nil) (loc-var nil) (body nil)) (loop (setq stmt (mk-intf (getLine fi))) (when (null stmt) (return t)) (cond ((eq (car stmt) 'proc) (setq proc-name (nth 1 stmt)) (setq proc-parm (nth 2 stmt))) ((eq (car stmt) 'end_proc) (return t)) ((eq (car stmt) 'print) (setq body (append body (list (cons 'printline (cdr stmt)))))) ((eq (car stmt) 'input) (setq body (append body (list (list 'setq (cadr stmt) (list 'read) ))))) (t (printsline (strCat "****  " (output stmt) "  ")) (setq *flagerr* t)))) `(defun ,proc-name ,proc-parm ,@body))) 

Let's now prepare this source on a mini-basic:


 proc f1(x,y) print x print y end_proc proc f2(x) input x print x end_proc 

and try to translate it ... We will have two Lisp functions f1 and f2 . Let's look at their defining expressions and see that they are generated correctly:


 (getd 'f1) ==> (EXPR (XY) (PRINTLINE X) (PRINTLINE Y)) (getd 'f2) ==> (EXPR (X) (SETQ X (READ)) (PRINTLINE X)) 

You can call these functions and make sure that they work exactly as expected. Don't be confused by the fact that we enter a value into a variable-parameter — we just don’t have local variables yet ... Let's add them.


The local operator can be anywhere in the procedure and occur more than once. If the local operator is encountered during the processing of the procedure, then it is necessary to take a list of variables and save it in a local variable. After the end_proc statement is encountered, it is necessary to generate the form let and “enclose in it” all executed statements (for now only input and print ). Here is what action-proc will now look like:


 (defun action-proc (fi) (let ((stmt nil) (proc-name nil) (proc-parm nil) (loc-var nil) (lv nil) (body nil)) (loop (setq stmt (mk-intf (getLine fi))) (when (null stmt) (return t)) (cond ((eq (car stmt) 'proc) (setq proc-name (nth 1 stmt)) (setq proc-parm (nth 2 stmt))) ((eq (car stmt) 'end_proc) (return t)) ((eq (car stmt) 'print) (setq body (append body (list (cons 'printline (cdr stmt)))))) ((eq (car stmt) 'input) (setq body (append body (list (list 'setq (cadr stmt) (list 'read) ))))) ((eq (car stmt) 'local) (setq loc-var (append loc-var (cdr stmt)))) (t (printsline (strCat "****  " (output stmt) "  ")) (setq *flagerr* t)))) (iter (for a in (setof loc-var)) (collecting (list a 0) into lv)) `(defun ,proc-name ,proc-parm (let ,lv ,@body)))) 

The list of local variables is accumulated in the variable loc-var . After the processing of the procedure from this list is completed, a list of pairs of the form (name 0) is built . In this case, the repetition of identical names is undesirable ... How to prevent it? Of course, it is possible to check on each processing of the local operator whether there is a duplication of names (if there is, to give an error message). But, it seems to me, it is better to just eliminate the repetition, which makes the setof call. Now let's broadcast and run this program:


 proc f1(x,y) local a,b,c print x print y input a print a end_proc 

We are convinced that it works exactly as the algorithm suggests. But all the fun lies ahead!


From here you can download the final version of what we are with you on sh code ...


To be continued!


')

Source: https://habr.com/ru/post/419103/


All Articles