Writing a Lisp Interpreter in Java

Some time ago I wanted to write my own small interpreted scripting language, just for fun, without setting myself any serious tasks. I was then actively reading the famous SICP magic book (Structure and Interpretation of Computer Programs), and I wanted to implement the abstractions described in it — functions as first-class objects, closures, macros, and so on. In parallel, I was interested in Haskell, and decided to combine the pleasant with the pleasant, and write an interpreter of the language on it. In my language there should have been strict semantics with call by value and mutable variables. I connected the resulting Lisp-family language in my local environment with the name Liscript, the full implementation of which fit in about 250 lines, including the parser, the kernel and console / file I / O. This is more than enough for the sake of interest to solve any problems, which usually torment students who have managed to learn Lisp on the curriculum.

Over time, I wanted to make a cross-platform GUI interface to the interpreter with the perspective of porting to Android, so I implemented the second version of the Java interpreter, the appearance of which you can see in the image above. Yes, it supports graphical output and in general interoperability with Java, and this Tetris is written in Liscript, part of its code is visible. Who are interested in the details - I ask under the cat.

Within one article, it is difficult to fit all the features of the language and its implementation, and to write a series of many articles, starting with a lengthy description of the implementation of the parser using the standard library, my modesty will not allow. (By the way, in both implementations — I didn’t use any parsing libraries in Haskell and Java, but wrote everything manually — it took Haskell as much as 10 lines of code). Therefore, I will write some things in the style of “came, saw, won”. In general, I wrote text parsers in AST types. The only thing I want to mention is a couple of points - I am not a fan of parentheses, but the ease of parsing, the code homoiconnost (code as data) and the lack of infix binary operators with the need to take into account priority and associativity and parsing through Poland or sorting stations outweighs all the claims to this syntax. As a result of the parsing, I get a list that is passed to the interpretation function. In the Java implementation, I did not use library list types, but my override was simple, simply connected, not safe, and so on:

')

Spoiler header

/**   Liscript -   */ public static class ConsList { /**  -     */ public Object car; /**  -     */ public ConsList cdr; /**      . * @param h  -   * @param t  -   */ ConsList(Object h, ConsList t) { car = h; cdr = t; } /** ,     * @return / */ public boolean isEmpty() { return this.car == null && this.cdr == null; } /**    * @return  */ public int size() { int r = 0; ConsList p = this; while (!p.isEmpty()) {r += 1; p = p.cdr;} return r; } /** @return     */ @Override public String toString() { return showVal(this); } }

In a similar way, class types for a function, a macro, are implemented. Example code of a function type class:

Spoiler header

  /**   Liscript -  */ public static class Func { /**      */ public ConsList pars; /**   */ public Object body; /** ,     */ public Env clojure; /**  * @param p      * @param b   * @param c ,     */ Func(ConsList p, Object b, Env c) { pars = p; body = b; clojure = c; } /** @return    */ @Override public String toString() { return showVal(this); } }

Environment

Implemented in full compliance with SICP - hierarchical structure of dictionaries, where each dictionary is HashMap <String, Object>, the class contains methods for adding, receiving and changing values by a string name. And here you can already take a creative approach: for example, what to do if you try to get a value by the missing key (variable name)? Interrupt execution with an error? Or return a string representation of the key? The same thing, if we try to add a variable whose name is already contained in the dictionary of the current frame of the environment - give an error or override the variable? Such trifles as a result determine the peculiarities of the language, and for example, I like that I can define them myself. We get auto-quote and deep binding, with all the pros and cons of this approach.

I also wanted to implement the variable arity of special forms right in the core of the language, and not later in the standard library. Many of them allow the transmission of a number of parameters and / or do not work quite exactly as their like-named counterparts in other Lisp dialects — this does not complicate the implementation of the interpreter. An example of working with the environment (after => there is an interpreter's answer):

 ++ "a1 = " a1 ", b1 = " b1 ", c1 = " c1 => a1 = a1, b1 = b1, c1 = c1 def a1 1 b1 (+ a1 1) (++ "c" a1) (+ a1 b1) => OK ++ "a1 = " a1 ", b1 = " b1 ", c1 = " c1 => a1 = 1, b1 = 2, c1 = 3 set! (++ "a" 1) 5 c1 10 => OK ++ "a1 = " a1 ", b1 = " (get (++ "b" 1)) ", c1 = " c1 => a1 = 5, b1 = 2, c1 = 10

Functions

They are first class objects. When created, the function captures the current context. When called, the arguments are calculated strictly sequentially. Implemented automatic tail call optimization - TCO:

 defn is-even (n) (cond (= n 0) true (is-odd (- n 1)) ) => OK defn is-odd (n) (cond (= n 0) false (is-even (- n 1)) ) => OK is-even 10000 => true is-even 10001 => false

The special form of the tray allows you to print a call stack when applying a function. So, for example, factorial calculation from 3 occurs:

Spoiler header

 defn f (i) (cond (< i 2) 1 (* i (f (- i 1)))) => OK tray f 3 => 1 ← (f 3) 2 ← f 2 → (lambda (i) (cond (< i 2) 1 (* i (f (- i 1))))) 2 ← (cond (< i 2) 1 (* i (f (- i 1)))) 3 ← (< i 2) 4 ← i 4 → 3 3 → false 3 ← (* i (f (- i 1))) 4 ← i 4 → 3 4 ← (f (- i 1)) 5 ← f 5 → (lambda (i) (cond (< i 2) 1 (* i (f (- i 1))))) 5 ← (- i 1) 6 ← i 6 → 3 5 → 2 5 ← (cond (< i 2) 1 (* i (f (- i 1)))) 6 ← (< i 2) 7 ← i 7 → 2 6 → false 6 ← (* i (f (- i 1))) 7 ← i 7 → 2 7 ← (f (- i 1)) 8 ← f 8 → (lambda (i) (cond (< i 2) 1 (* i (f (- i 1))))) 8 ← (- i 1) 9 ← i 9 → 2 8 → 1 8 ← (cond (< i 2) 1 (* i (f (- i 1)))) 9 ← (< i 2) 10 ← i 10 → 1 9 → true 8 → 1 7 → 1 6 → 2 5 → 2 4 → 2 3 → 6 2 → 6 1 → 6 6

If, when calling the function of arguments, more than the number of formal parameters is passed, then the last formal parameter will be associated with the list of calculated remaining passed arguments. If it is less, then these formal parameters will not be connected in the environment of the function calculation and in the calculation of its body they will be searched in the environments of the upper level of the hierarchy. It would be possible to choose the currying option — return a partially applied function, or give an error of inconsistency in the number of arguments.

Macros

Liscript supports so-called runtime macros, which are objects of the first class of the language. They can be created, associated with names, returned as a result of functions, and executed during the work (interpretation of the code). The expression obtained from the source code immediately begins to be interpreted, without the preliminary macros opening stage, so the macros remain full language types and are expanded and calculated during the interpretation according to all macro calculation rules — first, the uncalculated passed arguments are substituted into the macro body and then this macro body is calculated in the current environment (as opposed to the function body, which is always calculated in a separate own environment, in which there are already aritelno calculated values of the actual parameters):

  def m (macro (ir) (cond (<= i 0) 'r (m (- i 1) (cons ir)))) => OK m 5 nil => (cons (- (- (- (- 5 1) 1) 1) 1) (cons (- (- (- 5 1) 1) 1) (cons (- (- 5 1) 1) (cons (- 5 1) (cons 5 nil))))) eval (m 5 nil) => (1 2 3 4 5)

Java Interoperability

Implemented through Java Reflection. Only 2 special forms: class - defines the class by its full name and java - calls the class method with the passed parameters. The search for the required method among the overloaded ones of the same name is carried out taking into account the number and types of the passed parameters. To speed up the work, the class method once found and called in the current session of the interpreter is stored in a table of called methods, and when calling any method, a search is first performed in the table — memoization.

 def m (java (class "java.util.HashMap") "new") => OK java m "put" 1 "a" => OK java m "put" "b" 2 => OK java m "get" 1 => a m => {1=a, b=2}

Thus, we can access many of the resources of the implementation language, including the graph. This code opens a separate window with a red line drawn on a white background:

 (def image (java (class "java.awt.image.BufferedImage") "new" 100 100 1)) (def imageGraphics (java image "createGraphics")) (java imageGraphics "setBackground" (java (class "java.awt.Color") "new" 255 255 255)) (java imageGraphics "clearRect" 0 0 100 100) (java imageGraphics "setColor" (java (class "java.awt.Color") "new" 255 0 0)) (java imageGraphics "drawLine" 10 10 90 90) (def icon (java (class "javax.swing.ImageIcon") "new")) (java icon "setImage" image) (def label (java (class "javax.swing.JLabel") "new")) (java label "setIcon" icon) (def window (java (class "javax.swing.JFrame") "new")) (java window "setLayout" (java (class "java.awt.FlowLayout") "new")) (java window "add" label) (java window "setVisible" true) (java window "pack")

Of course, you can select typical blocks into separate functions or macros and register them once in the standard library, which is loaded when the interpreter starts. And since the interpreter implements multi-threaded execution of tasks in separate tabs with a common mutable environment (yes, I know that the HashMap selected as the dictionary repository is not thread-safe and this can create problems when changing the environment from parallel threads, and it was better to use HashTable), so, this allows in one tab to start a procedure that calls itself after a certain waiting time on a timer, and in another window (thread) - a procedure that requests user input and performs certain actions. This was how Tetris was implemented (with the feature of blocking user input - each command must be confirmed with Ctrl + Enter).

This project is available on Github at github.com/Ivana-/Liscript-GUI-Java-Swing . It contains the source files of the interpreter and its GUI in Swing (I know that it is already 21st century in the yard, but I don’t rush to rewrite JavaFX), as well as script files of the standard library, language, interpreter descriptions and a sufficient number of demos. If you wish to criticize, please note that this is my first project in Java and in OOP language in general, so much can be realized nonoptimally. But I will be interested in opinions and feedback on this project.

Source: https://habr.com/ru/post/281859/

All Articles

Writing a Lisp Interpreter in Java

Environment

Functions

Macros

Java Interoperability

More articles: