📜 ⬆️ ⬇️

Writing a simple Lisp translator - III

Previous article

Errors, Errors, Errors ...


A good program should be protected from user errors. This is absolutely indisputable. Errors need to be processed, and even better - to prevent (prevention is always better than treatment!). Aerobatics - to build a dialogue with the user, so that the latter simply could not make a mistake.

For example, if a user needs to enter a positive integer in the input field, you can, of course, analyze the answer, and, finding non-numeric characters, issue a warning and ask the user to repeat the input. But it is much better just to prohibit the input of non-numeric characters!

Unfortunately, this technique can not be applied by far. In particular, the variety of constructions arriving at the input of the translator is too large to simply “cut off the wrong” by specifying an input mask.
')
A person has the privilege of making mistakes, and the translator should, in the case of entering incorrect language constructs, produce a clear diagnosis and, if possible, continue to analyze the source text in order to identify all errors. The user will probably not like it very much if the translator will catch errors one by one. And it is absolutely unacceptable to recognize the situation in which the program “falls” with a system error message.
In this article we will go through the code that has been worked out earlier critically, and will try to prevent (handle) possible errors.

Let's start with the first start function. What she does? It takes the name of the input file, opens it, and processes it line by line. For such programs, the user interaction script is already “settled” - it can be considered canonical:


Our version of the start procedure does not satisfy this scenario. In fact, look at the code above:

(defun start (&optional (fname "")) (setq *numline* 0) (setq *flagerr* nil) (setq *oplist* …) ;;      (when (zerop (strLen fname)) (setq fname (sysGetOpenName (sysHome) "-|*.mbs"))) (let ((fi (gensym 'fi))) (filOpen fi fname _INPUT) (loop (let ((curr-proc (action-proc fi))) (when *flagerr* (return t)) (when (filEOF fi) (return t)) (eval curr-proc))) (filClose fi)) (when *flagerr* (printsline "****   "))) 

The negative response of the user is not analyzed, so if the “reject” button is pressed, the program will “fall”. The existence of the file is also not analyzed. Unfortunately, this flaw does not exhaust the shortcomings. Obviously, if the procedure of the mini-basic is the last in the input file, then the analysis of the end of the file will cause the loop to break before the generated function is loaded into the Lisp environment.

Fix these flaws:

 (defun start (&optional (fname "")) (setq *numline* 0) (setq *flagerr* nil) (setq *oplist* … ) (when (zerop (strLen fname)) (setq fname (sysGetOpenName (sysHome) "-|*.mbs"))) (if (and fname (filExistp fname)) (let ((fi (gensym 'fi))) (filOpen fi fname _INPUT) (loop (let ((curr-proc (action-proc fi))) (when *flagerr* (return t)) (when curr-proc (eval curr-proc)) (when (filEOF fi) (return t)))) (filClose fi) (when *flagerr* (printsline "****   "))) (printsline (if fname (strCat "****  " fname "  ") "****   "))) (unset '*numline*) (unset '*flagerr*) (unset '*oplist*)) 

If the file name is given and the file exists, then processing is performed. Otherwise, one of the messages is printed: “File does not exist” or “File name omitted”.
In the body of the main loop, the following actions are performed sequentially:


The code seemed to be better ... But one more serious flaw remained unresolved - after completing the processing of the procedure containing one or more errors, the main loop will be interrupted and the program will finish without viewing the part of the original year located after the procedure with errors. This is bad - I would like the translator to produce all the errors that can be detected at each launch.

To correct this shortcoming, let's introduce a global variable “error counter”, while processing the procedure with errors, we will increase this counter. And the error flag will be reset after processing each procedure:

 (defun start (&optional (fname "")) (setq *numline* 0) (setq *flagerr* nil) (setq *errcount* 0) (setq *oplist* …) (when (zerop (strLen fname)) (setq fname (sysGetOpenName (sysHome) "-|*.mbs"))) (if (and fname (filExistp fname)) (let ((fi (gensym 'fi))) (filCloseAll) (filOpen fi fname _INPUT) (loop (let ((curr-proc (action-proc fi))) (when *flagerr* (setq *errcount* (add1 *errcount*))) (when (and curr-proc (not *flagerr*)) (eval curr-proc)) (setq *flagerr* nil) (when (filEOF fi) (return t)))) (filClose fi) (when (> *errcount* 0) (printsline "****   "))) (printsline (if fname (strCat "****  " fname "  ") "****   "))) (unset '*numline*) (unset '*flagerr*) (unset '*oplist*) (unset '*errcount*)) 

Now the start function will work acceptable. Let's see this. Create the following source file:

 * *    * proc test1(x) local y y=x^2 bla-bla end_proc * *    * proc test2() local x,y input x y=test1(x) print y end_proc * *    * proc test3(x) bla-bla-bla print x end_proc 

And “let it through” through our translator. We get:

 0001 * 0002 *    0003 * 0004 proc test1(x) 0005 local y 0006 y=x^2 0007 bla-bla ****  (BLA - BLA)   0008 end_proc 0009 * 0010 *    0011 * 0012 proc test2() 0013 local x,y 0014 input x 0015 y=test1(x) 0016 print y 0017 end_proc 0018 * 0019 *    0020 * 0021 proc test3(x) 0022 bla-bla-bla ****  (BLA - BLA - BLA)   0023 print x 0024 end_proc 0025 ****    

We assume that we have coped with the start function. But the “work on the bugs” has just begun. Let us take a closer look at the syntax of that part of the language that we have already implemented.

Probably the most common syntax mistake that people most often make is the wrong bracket structure (unbalanced or standing parentheses in the wrong order). Recall what happens to the line of the source code of the program on the mini-BASIC after it is read. The line is parsed (broken into lexemes), and then the list of tokens is translated into an internal list form. In the list of tokens, parentheses are separate tokens and we do not check their balance. This could be done as a separate function, but the list of tokens is passed to the input of the function input, which translates the list of strings into the Lisp list. If an incorrect string expression is passed to the input of the function, the function will return an error.

Let's process this error.

In HomeLisp, error handling is a construct (try Expression-1 except Expression-1). It works as follows:


With that said, the transfer to the list form can be arranged as follows:

 (defun mk-intf (txt) (let ((lex (parser txt " ," "()+-*/\^=<>%")) (intf "")) (iter (for a in lex) (setq intf (strCat intf a " "))) (try (input (strCat "(" intf ")")) except (progn (printsline (strCat "**** " (errormessage))) `(,txt) )))) 

In case of a conversion error, a system message will be displayed, and a list of one element will be returned as the result - the original line of code. Further, this list will fall (as a regular operator) into the action-proc procedure. And, of course, will not be recognized. This will generate another error message, and the translator will continue to work. Let's prepare the following source code, and try to translate it:

 * *    * proc test1(x) local y y=(x^2)) end_proc * *    * proc test2() local x,y input x y=test1(x) print y end_proc * *    * proc test3(x) x=3+)x^2 print x end_proc 

We get the expected result:

 0001 * 0002 *    0003 * 0004 proc test1(x) 0005 local y 0006 y=(x^2)) ****        ****  ("y=(x^2))")   0007 end_proc 0008 * 0009 *    0010 * 0011 proc test2() 0012 local x,y 0013 input x 0014 y=test1(x) 0015 print y 0016 end_proc 0017 * 0018 *    0019 * 0020 proc test3(x) 0021 x=3+)x^2 ****        ****  ("x=3+)x^2")   0022 print x 0023 end_proc ****    

And now let's take a critical look at the code that converts arithmetic expressions into a prefix notation. This code does not contain any means of fixing user errors. Unfortunately, these errors can be quite a lot. Let's fix this mistake. To begin with, we will try to broadcast a completely innocent (seemingly) code:

 proc test() local x,y x=6 y=-x print y end_proc 

The broadcast will end with the “fall” of the translator! The fall will cause the operator y = -x. What is the matter? In unary minus! Transforming the formula from the infix form into the prefix form, we somehow did not think that a minus is a “two-faced” - there is a binary minus (a sign of the operation), and there is a unary minus (a sign of a number). Our parser does not know this difference - it considers all cons to binary ... What now to do? In order not to tear down an already working code, let's turn all unary minuses into binary ones. How? A very simple. After all, it is quite obvious that the unary minus "lives" only in such constructions:

"(-something"
“> Something”
“<-No”
“= Something”
well, and at the very beginning of the formula, he can also meet. Therefore, if, before breaking into lexemes, we make the following substitutions:

“(- something” => “(0-something”
“> Something” => “> 0-something”
“<-No” => “<0-something”
“= -No” => “= 0-something”

and if the formula starts with a minus, we add a zero to the beginning of the formula, then all the minuses will become binary and the error will be eliminated radically. Let's call the function that will do the above conversion, the name “prepro”. Here is what it might look like:

 (defun prepro (s) (let* ((s0 (if (eq "-" (strLeft s 1)) (strCat "0" s) s)) (s1 (strRep s0 "(-" "(0-")) (s2 (strRep s1 "=-" "=0-")) (s3 (strRep s2 ">-" ">0-")) (s4 (strRep s3 "<-" "<0-"))) s4)) 

Special comments are not required here. But our simple parser has another not quite obvious at first glance misfortune - double signs of operations. When working with formulas, the signs “>” and “=” next to each other, mean one operation “> =” (and must constitute one lexeme!). The parser does not want to know this - it will make each of the characters a separate lexeme. You can cope with this problem by reviewing the list of received lexemes, and if the corresponding characters are next to each other, by performing the union. Let's call the function that will perform the merge named “postpro”. Here is the code for a possible implementation:

 (defun postpro (lex-list) (cond ((null (cdr lex-list)) lex-list) (t (let ((c1 (car lex-list)) (c2 (cadr lex-list))) (cond ((and (eq c1 ">") (eq c2 "=")) (cons ">=" (postpro (cddr lex-list)))) ((and (eq c1 "<") (eq c2 "=")) (cons "<=" (postpro (cddr lex-list)))) ((and (eq c1 "=") (eq c2 "=")) (cons "==" (postpro (cddr lex-list)))) ((and (eq c1 "<") (eq c2 ">")) (cons "<>" (postpro (cddr lex-list)))) ((and (eq c1 ">") (eq c2 "<")) (cons "<>" (postpro (cddr lex-list)))) ((and (eq c1 "!") (eq c2 "=")) (cons "/=" (postpro (cddr lex-list)))) ((and (eq c1 "/") (eq c2 "=")) (cons "/=" (postpro (cddr lex-list)))) (t (cons c1 (postpro (cdr lex-list))))))))) 

Also, as we see, nothing special. But now the final transfer function of the operator in the internal list form will look like this:

 (defun mk-intf (txt) (let ((lex (postpro (parser (prepro txt) " ," "()+-*/\^=<>%"))) (intf "")) (iter (for a in lex) (setq intf (strCat intf a " "))) (try (input (strCat "(" intf ")")) except (progn (printsline (strCat "**** " (errormessage))) `(,txt) )))) 

And now let's take a critical look at the inf2ipn function. What user errors can it “dump”? The imbalance of the brackets we have already cut off above. What could be more? Two signs of operation or two operands standing in a row. One could analyze this in the code inf2ipn (and those who wish can do it themselves). We “catch” these errors at the stage of transforming a formula from an SCR to a prefix one. And let us (just in case) intercept all errors that may occur in the process of converting a formula from an infix form to a prefix one. The best place for this is the i2p wrapper function. Now it may look like this:

 (defun i2p (f) (try (ipn2pref (inf2ipn f)) except (progn (printsline "****    ") (printsline (strCat "**** " (errormessage))) (setq *flagerr* t) nil))) 

And now let's prevent the appearance in formulas of two characters of an operation or two operands in a row. The previous article describes the algorithm for converting a formula from an SCR to a prefix form. A sign of the correctness of the completion of this algorithm is that the last step in the stack should contain a single value. If this is not the case, then a mistake was made. And one more error situation occurs in the case when the function is called with the wrong (more or less) number of parameters. These situations should be “caught”:

 (defun ipn2pref (f &optional (s nil)) (cond ((null f) (if (null (cdr s)) (car s) (progn (printsline "****    ") (setq *flagerr* t) nil))) ((numberp (car f)) (ipn2pref (cdr f) (cons (car f) s))) ((is-op (car f)) (let ((ar (arity (car f)))) (if (< (length s) ar) (progn (setq *flagerr* t) (printsline "****    ") nil) (ipn2pref (cdr f) (cons (cons (car f) (reverse (subseq s 0 ar))) (subseq s ar)))))) ((atom (car f)) (ipn2pref (cdr f) (cons (car f) s))) (t (ipn2pref (cdr f) (cons (list (car f) (car s)) (cdr s)))))) 

Now let's take a critical look at the proc operator handler. We clearly missed two points. The first thing to do is to remember, when processing the procedure, to calculate its arity (the number of arguments) and modify the global variable * oplist * accordingly. And the second is that the functions we generate do not return the correct value! More precisely, as the result of the functions generated by our translator, the value of the last form, computed before return, will be returned. To guarantee the return of the desired value, I propose to transfer the result variable from Pascal. Now, if it is necessary to return the desired value, it is enough for the user to assign the desired value to this variable before exiting the function, and we need to insert the name result into the function body with the last expression when generating the function body. All this brings the action-proc function to mind:

 (defun action-proc (fi) (let ((stmt nil) (proc-name nil) (proc-parm nil) (loc-var nil) (lv '((result 0))) (body nil)) (loop (setq stmt (mk-intf (getLine fi))) (when (null stmt) (return t)) (cond ((eq (car stmt) 'proc) (setq proc-name (nth 1 stmt)) (setq proc-parm (nth 2 stmt)) (setq *oplist* (cons (list proc-name (length proc-parm)) *oplist*))) ((eq (car stmt) 'end_proc) (return t)) ((eq (car stmt) 'print) (setq body (append body (list (cons 'printline (cdr stmt)))))) ((eq (car stmt) 'input) (setq body (append body (list (list 'setq (cadr stmt) (list 'read) ))))) ((eq (car stmt) 'local) (setq loc-var (append loc-var (cdr stmt)))) ((eq (cadr stmt) '=) (setq body (append body (list (action-set stmt))))) (t (printsline (strCat "****  " (output stmt) "  ")) (setq *flagerr* t)))) (iter (for a in (setof loc-var)) (collecting (list a 0) into lv)) (if proc-name `(defun ,proc-name ,proc-parm (let ,lv ,@body result)) nil))) 

We’ll stop at this for now (although we’ll still encounter problems, and the code will have to be refined; but such is the programmer’s lot ...) And now we’ll consider two improvements to our language that are appropriate now.

Minor improvements ...


In the previous article I wrote that it is inconvenient for a programmer if in a language one statement occupies exactly one line. It is necessary to provide the ability to write bulky operators on multiple lines. Let's implement it. Make it completely easy. In the getLine procedure, we will create a local variable in which we will accumulate the read text (provided that this is not a comment and ends with a pair of “_” characters. Once the significant line with a different ending is fixed, we return the accumulated value as a value. Here is the code:

 (defun getLine (fil) (let ((stri "") (res "")) (loop (when (filEof fil) (return "")) (setq *numline* (add1 *numline*)) (setq stri (filGetline fil)) (printsline (strCat (format *numline* "0000") " " (strRTrim stri))) (unless (or (eq "" stri) (eq "*" (strLeft stri 1))) (setq stri (strATrim stri)) (if (eq " _"(strRight stri 2)) (setq res (strCat res (strLeft stri (- (strLen stri) 2)))) (setq res (strCat res stri))) (unless (eq " _"(strRight stri 2)) (return res)))))) 

And the last improvement. In many programming languages, it is possible to use logical operands in arithmetic expressions (which in this case are calculated to zero or one). This gives the language additional expressiveness and, by the way, is quite in line with the basic spirit. In our mini-BASIC, an attempt to calculate the following expression, for example:

 z=(x>y)*5+(x<=y)*10 

will cause a runtime error. And this is understandable: in Lisp, the expression (> xy) is calculated to Nil or T. And Nil / T cannot be multiplied by 5 ... However, this trouble is easy to help. Let's write a few simple macros that will replace the result of comparison expressions with 0/1 (instead of Nil / T):

 (defmacro $= (xy) `(if (= ,x ,y) 1 0)) (defmacro $== (xy) `(if (= ,x ,y) 1 0)) (defmacro $> (xy) `(if (> ,x ,y) 1 0)) (defmacro $< (xy) `(if (< ,x ,y) 1 0)) (defmacro $/= (xy) `(if (/= ,x ,y) 1 0)) (defmacro $<> (xy) `(if (/= ,x ,y) 1 0)) (defmacro $<= (xy) `(if (<= ,x ,y) 1 0)) (defmacro $>= (xy) `(if (>= ,x ,y) 1 0)) 

Now let's take a look at the line in the ipn2pref function that performs the processing of the operation. Here is this line:

 (ipn2pref (cdr f) (cons (cons (car f) (reverse (subseq s 0 ar))) (subseq s ar))) 

Here (car f) is the name of the operation. Let's write the tiny function of replacing comparison codes:

 (defun chng-comp (op) (if (member op '(= == /= <> > < >= <=)) (implode (cons '$ (explode op))) op)) 

The function checks whether its argument is a comparison operation, and, if necessary, adds the “$” character to the beginning. Now let's call it in the right place of the ipn2pref function:

 (ipn2pref (cdr f) (cons (cons (chng-comp (car f)) (reverse (subseq s 0 ar))) (subseq s ar))) 

What is the result? Comparison operations will be replaced by calls to the corresponding macro, and all other operations will not change. If you translate just such a function:

 proc test() local x,y x=1 y=2 result=(x>y)*5+(x<=y)*10 end_proc 

and then call it, we get the expected result.

For today - everything.

The code for this article is located here.
To be continued.

Source: https://habr.com/ru/post/423663/


All Articles