📜 ⬆️ ⬇️

We write (unedited) Haskell interpreter with alex and happy

Hi, Habr! In this article, we will look at how to do-it-yourself a (unedited) Haskell interpreter. Interested please under the cat!

Once I got the idea to write my own interpreter, moreover on Haskell. Writing it from scratch is not a job for the faint of heart, and why, if everything is already written for this by other, (possibly) more experienced people!

Step 1: the wording of the TOR to themselves


Our (un) interpreter will work like this:

let a = 2 in a*2
4
let a = 8 in (let b = a - 1 in a*b)
56

Only let-in, only Int, from operations: addition, subtraction, multiplication, division (integer). Go!
')

Step 2: alex


The first thing we need is a lexer (a program that will divide the code into parts — tokens). Those who wrote their translators on the great and mighty C probably remembered flex - the great and mighty lexer generator. We will use an equally great and mighty alex - also a lexer generator, but for Haskell. Download alex from here or

$ sudo apt-get install alex

then open your favorite text editor and write:

 { module Lex where } %wrapper "basic" $digit = 0-9 $alpha = [a-zA-Z] tokens :- $white ; let { \s -> TLet } in { \s -> TIn } $digit+ { \s -> TNum (read s)} [\=\+\-\*\/\(\)] { \s -> TSym (head s)} $alpha [$alpha $digit \_ \']* { \s -> TVar s} { data Token = TLet | TIn | TNum Int | TSym Char | TVar String deriving (Eq, Show) } 

Scary.

In fact, everything is simple. First we declare the Lex module (the code in curly brackets is pure Haskell), then we say that we want to use the basic wrapper, that is, without any frills - cheap and angry, then the definitions of character sets (charsets) go - the name of the charset should begin with $, the value is (almost) a regular expression. $ digit is all numbers, $ alpha is all letters. Now the most important thing is tokens. after: - their definitions should go, the regular expression on the left, the token on the right. We ignore spaces (on the left is a charset for whitespace, a semicolon on the right), let is TLet token, in - TIn, one or more digits - TNum, all sorts of pluses and minuses - TSym, letters, underscore and '- TVar. Further we see that all the scary words on the letter T are values ​​of the type Token - nothing complicated.

Now is the time of magic.

Save the file as Lex.x, then

$ alex Lex.x

Done! Unas is a module of our lexer - Lex.hs!

Step 3: happy


Now we need a parser generator - happy. Its counterparts for C are bison and yacc. Download it from here or

$ sudo apt-get install happy

Create the file Synt.y

 { module Synt where import Lex } %name synt %tokentype { Token } %error { parseError } %token let { TLet } in { TIn } num { TNum $$ } var { TVar $$ } '=' { TSym '=' } '+' { TSym '+' } '-' { TSym '-' } '*' { TSym '*' } '/' { TSym '/' } '(' { TSym '(' } ')' { TSym ')' } %% Exp: let var '=' Exp in Exp { Let $2 $4 $6 } | Exp1 { Exp1 $1 } Exp1: Exp1 '+' Term { Plus $1 $3 } | Exp1 '-' Term { Minus $1 $3 } | Term { Term $1 } Term: Term '*' Factor { Mul $1 $3 } | Term '/' Factor { Div $1 $3 } | Factor { Factor $1 } Factor: num { Num $1 } | var { Var $1 } | '(' Exp ')' { Brack $2 } { parseError :: [Token] -> a parseError _ = error "Parse error" data Exp = Let String Exp Exp | Exp1 Exp1 deriving (Show) data Exp1 = Plus Exp1 Term | Minus Exp1 Term | Term Term deriving (Show) data Term = Mul Term Factor | Div Term Factor | Factor Factor deriving (Show) data Factor = Num Int | Var String | Brack Exp deriving (Show) } 

More scary.

Here, too, the Haskell code is taken in curly brackets, so first we declare the Synt module, then import Lex (we need the Token type from there). "% name synt" means that our parser function will be called synt, "% tokentype {Token}" - that we use the type Token as the type of tokens would never have guessed , "% error {parseError}" sets the function that will be handle errors.

Next come the matching tokens for their pseudonyms. But now it will be scary. We say that expression (Exp) is let variable = expression in expression or subexpression (Exp1). In the first case, you need to create a Let construct (its type is Exp, see the declaration below), and take the second, fourth, and sixth words (i.e. var, Exp and Exp) as arguments, and in the second case, create the Exp1 construct with the first word as an argument. Exp1 can be an addition or subtraction operation with the first and third words as arguments or Term. Term is similar, but for addition and multiplication operations. The reader will ask: “Why break into two types, is it really impossible to stuff multiplication and division in Exp1?” The fact is that the work of the parser is to build a so-called “parsing tree”. The most deeply nested operations are performed first. Term will be deeper than Exp1, so multiplication operations will be performed earlier than additions! Factor, in turn, can be a number, a variable or an expression in parentheses - again for the order of actions. Next, we declare the parseError function to "handle" errors and all data types like Exp or Factor.

That's it, the parser is ready!

$ happy Synt.y

Now we have a file Synt.hs

Last spurt: the interpreter


The interpreter code is presented below:

 module Main where import qualified Data.Map as M import Lex import Synt newtype Context = Context {getContext :: M.Map String Int} deriving (Show) pull :: Maybe a -> a pull (Just m) = m pull Nothing = error "Undefined variable" createContext :: Context createContext = Context {getContext = M.empty} getValue :: Context -> String -> Maybe Int getValue ctx name = M.lookup name $ getContext ctx solveExp :: Context -> Exp -> Maybe Int solveExp ctx exp = case exp of (Let name expl rexp) -> solveExp newCtx rexp where newCtx = Context {getContext = M.insert name (pull (solveExp ctx expl)) (getContext ctx)} (Exp1 exp1) -> solveExp1 ctx exp1 solveExp1 :: Context -> Exp1 -> Maybe Int solveExp1 ctx exp1 = case exp1 of (Plus lexp1 rterm) -> (+) <$> (solveExp1 ctx lexp1) <*> (solveTerm ctx rterm) (Minus lexp1 rterm) -> (-) <$> (solveExp1 ctx lexp1) <*> (solveTerm ctx rterm) (Term term) -> solveTerm ctx term solveTerm :: Context -> Term -> Maybe Int solveTerm ctx term = case term of (Mul lterm rfactor) -> (*) <$> (solveTerm ctx lterm) <*> (solveFactor ctx rfactor) (Div lterm rfactor) -> (div) <$> (solveTerm ctx lterm) <*> (solveFactor ctx rfactor) (Factor factor) -> solveFactor ctx factor solveFactor :: Context -> Factor -> Maybe Int solveFactor ctx factor = case factor of (Num n) -> (Just n) (Var s) -> getValue ctx s (Brack exp) -> solveExp ctx exp main = do s <- getContents mapM putStrLn $ (map (show . pull . (solveExp createContext) . synt . alexScanTokens) . lines) s 

Here we declare a Context type that contains an associative array (Map) with variable names and their values, a pull function that returns the value of a variable, if it exists, otherwise it raises an error. The createContext function simply creates an empty context, getValue looks for the value of a variable in the context. Now the fun part! To understand what is happening here, imagine that the line of our code is:

8

Then the parsing tree will be like this:

let res = Exp (Exp1 (Term (Num 8)))

and the result (i.e. 8) will be after

((solveFactor ctx) <- (solveTerm ctx) <- (solveExp1 ctx) <- (solveExp ctx)) res

This is not Haskell code, but a scheme: solveExp will pass res solveExp1, etc.
ctx here is just a context.

That is, the work of the solveExp function is that if the input let-in construct adds a variable to the context and evaluates Exp after in, if it is otherwise just calculate Exp1.
The function solveExp1 adds or subtracts, solveTerm multiplies and divides. solveFactor gets the values ​​of variables, returns numbers, and if Exp goes in brackets to it, then solveExp passes it (not again, but again) .

The main function takes stdin strings to EOF, splits them into a list of strings, allocates tokens (alexScanTokens) into each, passes through the parser (synt), calculates the value of the expression (solveExp) with an empty context (createContext), and makes from Maybe Int Int, and then String, and then displays the result.

Everything! Now everything is exactly! We compile everything, and our interpreter is ready! Feedback, comments, suggestions - please in the comments.

Source: https://habr.com/ru/post/335366/


All Articles