Not so long ago on Habré there was an excellent and inspiring article about compilers and stack machines. It shows the path from a simple implementation of the bytecode artist to more and more efficient versions. I wanted to show, using the example of the development of a stack machine, how to do this in Haskell-way.
Using the example of language interpretation for a stack machine, we will see how the mathematical concept of semigroups and monoids helps to develop and expand the program architecture, how one can use mono algebra, and how one can build programs in the form of a set of homomorphisms between algebraic systems. As working examples, we first build an interpreter that is inseparable from the code in the form of an EDSL, and then we will teach it different things: record arbitrary debugging information, separate the program code from the program itself, carry out a simple static analysis and calculate with different effects.
The article is intended for those who speak Haskell at an average level and above, for those who already use it in work or research, and for all curious people who have looked at what it is that the functionals have yet realized. Well, for those, of course, who did not scare the previous paragraph.
There was a lot of material, with a lot of examples in the code, and in order to make it easier for the reader to understand whether he needs to dive into it, I’ll give an annotated content.
The tasks of translation and interpretation present many interesting and useful examples to demonstrate various aspects of programming. They allow you to move to different levels of complexity and abstraction, while remaining quite practical. In this article, we will focus on demonstrating the capabilities of two important mathematical structures - a semigroup and a monoid . They are not so often discussed as monads or lenses, and they are not afraid of little programmers, these structures are much easier to understand, but with all that, they are the basis of functional programming. Masterly mastery of monoidal types, which is demonstrated by professionals, is admired by the simplicity and elegance of solutions.
The search for the word "monoid" for articles on Habré produces no more than four dozen articles (about the same monads, for example, there are three hundred of them). All of them conceptually start with something like: a monoid is such a lot ... and then, with quite understandable delight, list what is a monoid - from lines to finger trees, from regular expression parsers to God knows what else ! But in practice we think in reverse order: we have an object that needs to be modeled, we analyze its properties and find that it possesses the characteristics of an abstract structure, decide whether we need consequences from this circumstance and how we use it. We will go this way. And at the same time we will add a couple of interesting examples to the collection of useful monoids.
Stack machines, when studying functional programming, usually appear at the moment when they approach the concept of convolution. In this case, an extremely laconic implementation of the performer of the simplest stack calculator is given, for example, this:
calc :: String -> [Int] calc = interpretor . lexer where lexer = words interpretor = foldl (flip interprete) [] interprete c = case c of "add" -> binary $ \(x:y:s) -> x + y:s "mul" -> binary $ \(x:y:s) -> x * y:s "sub" -> binary $ \(x:y:s) -> y - x:s "div" -> binary $ \(x:y:s) -> y `div` x:s "pop" -> unary $ \(x:s) -> s "dup" -> unary $ \(x:s) -> x:x:s x -> case readMaybe x of Just n -> \s -> n:s Nothing -> error $ "Error: unknown command " ++ c where unary fs = case s of x:_ -> fs _ -> error $ "Error: " ++ c ++ " expected an argument." binary fs = case s of x:y:_ -> fs _ -> error $ "Error: " ++ c ++ " expected two arguments."
This uses the total readMaybe
parser from the Text.Read
module. It would be possible to bring the program two times shorter, but already without informative error messages, and this is ugly.
Great start to talk! Then, as a rule, they begin to foldM
effects: change the foldl
fold to foldM
, provide a totality through the Either String
monad, then add logging, wrapping everything with a StateT
transformer, embed a StateT
dictionary with the help of StateT
, and so on. Sometimes, to demonstrate the coolness of monadic calculations, an ambiguous calculator is implemented, which returns all possible values of the expression ( 2 p m 3 ) ∗ ( ( 4 p m 8 ) p m 5 ) . This is a long, good and interesting conversation. However, we will immediately lead our story differently, although we will end it with the same result.
Why, in general, is it about convolution? Because convolution (cathamorphism) is an abstraction of sequential processing of inductive data . A stack machine runs linearly through the code, following a sequence of instructions and generating one value — the state of the stack. I like to imagine the work of a convolutional stack machine as a translation of messenger RNA in a living cell. The ribosome, step by step, goes through the entire RNA chain, compares the nucleotide triplets with amino acids and creates the primary structure of the protein.
The convolutional machine has a number of limitations, the main one is that the program is always read from beginning to end and once. Branching, loops, and subroutine calls require a conceptual change to the interpreter. Nothing complicated, of course, but such a machine can no longer be described by a simple convolution.
According to the hypothesis of linguistic relativity, the properties of the language we use directly affect the properties of our thinking. Let's pay attention not to the machine, but to the languages and programs by which it is controlled.
All stack-oriented languages, both relatively low-level (bytecodes of Java and Python or .NET virtual machines), and languages at a higher level (PostScript, Forth or Joy), have one fundamental common feature: if you write two successively correct programs, then get the correct program. True, correct does not mean "correct", this program may crash on any data or fall into infinite loops and generally does not make sense, but the main thing is that such a program can be executed by a machine. At the same time, breaking the correct program into parts, we can easily reuse these parts, precisely because of their correctness. Finally, in any stack language, you can select a subset of commands that operate only on the internal state of the machine (stack or registers) that do not use any external memory. This subset will form a language with the property of concatenatability . In such a language, any program has the meaning of a state machine transducer, and the sequential execution of programs is equivalent to their composition, which means it is also a state transformer.
The general pattern is viewed: the combination (concatenation) of the correct programs generates the correct program, the combination of transducers generates the transducer. It turns out that the stack language programs are closed with respect to the operation of concatenation or form a structure called groupoid or magma . This means that, writing a program on a tape, it is possible to cut it almost haphazardly and then form new programs from the obtained segments. And you can cut up to segments with a single instruction.
When gluing important order. For example, these two programs are undoubtedly different:
t e x t t t 5 d u p p o p n e q t e x t t t 5 p o p d u p .
( texttt5dup)+ textttpop= texttt5+( textttduppop).
And what does this give us, as programmers? Associativity allows you to precompile, optimize and even parallelize arbitrary suitable segments of the program, and then combine them into an equivalent program. We can afford to carry out a static analysis of any segment of the program and use it in the analysis of the entire program precisely because we do not care where to put the brackets. These are very important and serious opportunities for a low-level language or an intermediate language in which not a person writes, but a translator. And from the point of view of a mathematician and an experienced functionary, this makes the state-transformer programs of the state of the machine complete endomorphisms . Endomorphisms also form a semigroup with the operation of composition. In algebra, such endomorphisms are called transformation semigroups with respect to some set. For example, finite automata form a transformation semigroup of a set of states.
"Semigroup" sounds half-heartedly, somehow defective. Maybe stack programs form a group ? E ... no, most programs are irreversible, that is, it will not work out by the result of the execution to unambiguously restore the original data. But we have a neutral element. In assembly languages, it is denoted textttnop and does nothing. If in the stack language such an operator is not explicitly defined, then it can be easily obtained by combining some commands, for example: textttincdec , textttduppop or textttswapswap . Such pairs can be painlessly cut from programs or, on the contrary, inserted anywhere in any number. Since there is a unit, our programs form a semigroup with a unit or a monoid . So, you can programmatically implement them in the form of monoids - endomorphisms over the state of the stack machine. This will allow you to define a small set of basic operations for the machine, and then create programs using their composition, getting a stack language in the form of an embedded domain-oriented language (EDSL).
In Haskell, semigroups and monoids are described using the Semigroup
and Monoid
. Their definitions are simple and reflect only the basic structure, the requirements of associativity and neutrality have to be checked by the programmer:
class Semigroup a where (<>) :: a -> a -> a class Semigroup a => Monoid a where mempty :: a
{-# LANGUAGE LambdaCase, GeneralizedNewtypeDeriving #-} import Data.Semigroup (Max(..),stimes) import Data.Monoid import Data.Vector ((//),(!),Vector) import qualified Data.Vector as V (replicate)
We will immediately build a machine that has a stack, a finite memory, and can crash in an amicable, clean way. We implement all this without the use of monads, encapsulating the necessary data in the type describing the machine. Thus, all the basic programs, and therefore all their combinations, will be pure converters of its state.
We start by defining the type for the virtual machine and the trivial setter functions.
type Stack = [Int] type Memory = Vector Int type Processor = VM -> VM memSize = 4 data VM = VM { stack :: Stack , status :: Maybe String , memory :: Memory } deriving Show emptyVM = VM mempty mempty (V.replicate memSize 0) setStack :: Stack -> Processor setStack x (VM _ sm) = VM xsm setStatus :: Maybe String -> Processor setStatus x (VM s _ m) = VM sxm setMemory :: Memory -> Processor setMemory x (VM s st _) = VM s st x
Setters are needed to make the semantics of the program explicit. Under the processor (type Processor
) we will understand the converter VM -> VM
.
Now we define the wrapper types for the transformation monoid and for the program:
instance Semigroup (Action a) where Action f <> Action g = Action (g . f) instance Monoid (Action a) where mempty = Action id newtype Program = Program { getProgram :: Action VM } deriving (Semigroup, Monoid)
Wrapping types define the principle of combining programs: these are endomorphisms with the reverse order of the composition (from left to right). Using wrappers allows the compiler to independently determine how the type Program
implements the requirements of the Semigroup
and Monoid
.
Performer programs are trivial:
run :: Program -> Processor run = runAction . getProgram exec :: Program -> VM exec prog = run prog emptyVM
The error message will form the err
function:
err :: String -> Processor err = setStatus . Just $ "Error! " ++ m
We use the Maybe
type differently than it is used normally: empty Nothing
in the status means that nothing dangerous happens, and the calculations can be continued, in turn, the string value marks problems. For convenience, we define two smart constructors: one for programs that work only with the stack, the other for those who need memory.
program :: (Stack -> Processor) -> Program program f = Program . Action $ \vm -> case status vm of Nothing -> f (stack vm) vm _ -> vm programM :: ((Memory, Stack) -> Processor) -> Program programM f = Program . Action $ \vm -> case status vm of Nothing -> f (memory vm, stack vm) vm _ -> vm
Now you can define basic language commands for working with the stack and memory, integer arithmetic, as well as equivalence and order relations.
pop = program $ \case x:s -> setStack s _ -> err "pop expected an argument." push x = program $ \s -> setStack (x:s) dup = program $ \case x:s -> setStack (x:x:s) _ -> err "dup expected an argument." swap = program $ \case x:y:s -> setStack (y:x:s) _ -> err "swap expected two arguments." exch = program $ \case x:y:s -> setStack (y:x:y:s) _ -> err "exch expected two arguments."
-- indexed if = programM $ if (i < 0 || i >= memSize) then const $ err $ "expected index in within 0 and " ++ show memSize else f put i = indexed i $ \case (m, x:s) -> setStack s . setMemory (m // [(i,x)]) _ -> err "put expected an argument" get i = indexed i $ \(m, s) -> setStack ((m ! i) : s)
unary nf = program $ \case x:s -> setStack (fx:s) _ -> err $ "operation " ++ show n ++ " expected an argument" binary nf = program $ \case x:y:s -> setStack (fxy:s) _ -> err $ "operation " ++ show n ++ " expected two arguments" add = binary "add" (+) sub = binary "sub" (flip (-)) mul = binary "mul" (*) frac = binary "frac" (flip div) modulo = binary "modulo" (flip mod) neg = unary "neg" (\x -> -x) inc = unary "inc" (\x -> x+1) dec = unary "dec" (\x -> x-1) eq = binary "eq" (\x -> \y -> if (x == y) then 1 else 0) neq = binary "neq" (\x -> \y -> if (x /= y) then 1 else 0) lt = binary "lt" (\x -> \y -> if (x > y) then 1 else 0) gt = binary "gt" (\x -> \y -> if (x < y) then 1 else 0)
To complete the work is not enough branching and cycles. In fact, for a built-in language, only branching is sufficient, cycles can be organized using recursion in the host language (in Haskell), but we will make our language self-sufficient. In addition, we use the fact that the programs form a semigroup and define a combinator of program repetition a specified number of times. The number of repetitions he will take from the stack.
branch :: Program -> Program -> Program branch br1 br2 = program go where go (x:s) = proceed (if (x /= 0) then br1 else br2) s go _ = err "branch expected an argument." while :: Program -> Program -> Program while test body = program (const go) where go vm = let res = proceed test (stack vm) vm in case (stack res) of 0:s -> proceed mempty s res _:s -> go $ proceed body s res _ -> err "while expected an argument." vm rep :: Program -> Program rep body = program go where go (n:s) = proceed (stimes n body) s go _ = err "rep expected an argument." proceed :: Program -> Stack -> Processor proceed prog s = run prog . setStack s
The types of functions branch
and while
say that these are not stand-alone programs, but program combinators: a typical approach when creating an EDSL in Haskell. The stimes
function stimes
defined for all semigroups, it returns the composition of the specified number of elements.
Finally, we will write several programs for experiments.
-- fact = dup <> push 2 <> lt <> branch (push 1) (dup <> dec <> fact) <> mul -- fact1 = push 1 <> swap <> while (dup <> push 1 <> gt) ( swap <> exch <> mul <> swap <> dec ) <> pop -- -- range = exch <> sub <> rep (dup <> inc) -- , -- fact2 = mconcat [ dec, push 2, swap, range, push 3, sub, rep mul] -- fact3 = dup <> put 0 <> dup <> dec <> rep (dec <> dup <> get 0 <> mul <> put 0) <> get 0 <> swap <> pop -- copy2 = exch <> exch -- -- gcd1 = while (copy2 <> neq) ( copy2 <> lt <> branch mempty (swap) <> exch <> sub ) <> pop -- pow = swap <> put 0 <> push 1 <> put 1 <> while (dup <> push 0 <> gt) ( dup <> push 2 <> modulo <> branch (dec <> get 0 <> dup <> get 1 <> mul <> put 1) (get 0) <> dup <> mul <> put 0 <> push 2 <> frac ) <> pop <> get 1
It turned out 120 lines of code with comments and annotations of types that define a machine operating with 18 teams with three combinators. This is how our car works.
λ> exec (push 6 <> fact) VM {stack = [720], status = Nothing, memory = [0,0,0,0]} λ> exec (push 6 <> fact3) VM {stack = [720], status = Nothing, memory = [720,0,0,0]} λ> exec (push 2 <> push 6 <> range) VM {stack = [6,5,4,3,2], status = Nothing, memory = [0,0,0,0]} λ> exec (push 6 <> push 9 <> gcd1) VM {stack = [3], status = Nothing, memory = [0,0,0,0]} λ> exec (push 3 <> push 15 <> pow) VM {stack = [14348907], status = Nothing, memory = [43046721,14348907,0,0]} λ> exec (push 9 <> add) VM {stack = [9], status = Just "Error! add expected two arguments", memory = [0,0,0,0]}
In fact, we have not done anything new - by combining transformers-endomorphisms, we essentially returned to the convolution, but it became implicit. Recall that a convolution gives an abstraction of sequential processing of inductive data. Data, in our case, is formed in an inductive way when pasting programs by the operator diamond , and they are “stored” in endomorphism in the form of a chain of compositions of functions of machine transformers until this chain is applied to the initial state. In the case of the use of combinators branch
and while
chain begins to turn into a tree or into a cycle. In the general case, we get a graph that reflects the operation of the automaton with the store memory, that is, the stack machine. It is this structure that we "turn off" during the execution of the program.
How effective is this implementation? The composition of functions is the best that the Haskell compiler can do. He is literally born for this! When it comes to the benefits of using knowledge of monoids, often give an example of diffList
difference lists - the implementation of a linked list in the form of a composition of endomorphisms. Difference lists fundamentally accelerate the formation of lists of many pieces due to the associativity of the composition of functions. Fussing with wrapper types does not lead to an increase in overhead costs, they "dissolve" at the compilation stage. Of the extra work there is only a state check at each step of the program execution.
I think by this point skeptics and casual readers have already left us, you can afford to relax and go to the next level of abstraction.
The concept of semigroups and monoids would not be so useful and universal, if not for a number of properties inherent in all semigroups and monoids without exception, which allow us to construct complex structures from simple structures in exactly the same way as we build complex programs from simple ones. These properties are no longer related to objects, but to types and are best written not in mathematical notation, but in the form of Haskell programs, which, due to the Curry-Howard isomorphism, are their proofs.
1) Monoids and semigroups can be “multiplied”. What is meant here is a product of types whose abstraction in Haskell is a tuple or a pair.
instance (Semigroup a, Semigroup b) => Semigroup (a,b) where (a1, b1) <> (a2, b2) = (a1 <> a2, b1 <> b2) instance (Monoid a, Monoid b) => Monoid (a,b) where mempty = (mempty, mempty )
2) There is a single monoid, it is represented by a single type ()
:
instance Semigroup () where () <> () = () instance Monoid () where mempty = ()
With the multiplication operation, the semigroups themselves form a semigroup, and taking into account the unit type, we can say that monoids form a monoid! The associativity and neutrality of the unit is fulfilled up to isomorphism, but this is not critical.
3) Mappings into a semigroup or monoid form, respectively, a semigroup or monoid. And here, too, it is easier to write this statement in Haskell:
instance Semigroup a => Semigroup (r -> a) where f <> g = \r -> fr <> gr instance Monoid a => Monoid (r -> a) where mempty = const mempty
We use these combinators to expand the capabilities of the stack language we have constructed. Let's make a major change and make our basic commands functions that return programs . This will not deprive them of their monoidal properties, but it will allow them to input arbitrary information from the outside into the work of all machine commands. This is what is meant:
(command1 <> command2) r == command1 r <> command2 r
Information can be any, for example, an external dictionary with some definitions, or a way to keep a journal of calculations, necessary for debugging. This is very similar to the action of the Reader
monad, which is just a function.
We will enter a log into the structure of the machine, but we will not bind it to any particular type, but will output it into a type parameter. We will write to the journal with the help of a generalized monoidal operation.
data VM a = VM { stack :: Stack , status :: Maybe String , memory :: Memory , journal :: a } deriving Show mkVM = VM mempty mempty (V.replicate memSize 0) setStack x (VM _ st ml) = VM x st ml setStatus st (VM s _ ml) = VM s st ml setMemory m (VM s st _ l) = VM s st ml addRecord x (VM s st mj) = VM s st m (x<>j) newtype Program a = Program { getProgram :: Action (VM a) } deriving (Semigroup, Monoid) type Program' a = (VM a -> VM a) -> Program a
From this point on, we allow ourselves not to specify the type annotations for all definitions, leaving the compiler to deal with them on their own, they are not complicated, although they become cumbersome. The teams themselves will not have to change, thanks to smart designers who will take over all the changes. Very small.
program fp = Program . Action $ \vm -> case status vm of Nothing -> p . (f (stack vm)) $ vm m -> vm programM fp = Program . Action $ \vm -> case status vm of Nothing -> p . (f (memory vm, stack vm)) $ vm m -> vm proceed p prog s = run (prog p) . setStack s rep body p = program go id where go (n:s) = proceed p (stimes n body) s go _ = err "rep expected an argument." branch br1 br2 p = program go id where go (x:s) = proceed p (if (x /= 0) then br1 else br2) s go _ = err "branch expected an argument." while test body p = program (const go) id where go vm = let res = proceed p test (stack vm) vm in case (stack res) of 0:s -> proceed p mempty s res _:s -> go $ proceed p body s res _ -> err "while expected an argument." vm
It remains to learn to enter external information in the programmer. This is very easy to do by creating different artists with a different journaling strategy. The first performer will be the most simple, silent, not wasting his time on keeping a journal:
exec prog = run (prog id) (mkVM ())
Here a single monoid ()
is useful to us - a neutral element in the algebra of monoids. Further, you can define a function for the executor, who is ready to record some information about the state of the machine in the log.
execLog p prog = run (prog $ \vm -> addRecord (p vm) vm) (mkVM mempty)
Information may be, for example, such:
logStack vm = [stack vm] logStackUsed = Max . length . stack logSteps = const (Sum 1) logMemoryUsed = Max . getSum . count . memory where count = foldMap (\x -> if x == 0 then 0 else 1)
:
λ> exec (push 4 <> fact2) VM {stack = [24], status = Nothing, memory = [0,0,0,0], journal = ()} λ> journal $ execLog logSteps (push 4 <> fact2) Sum {getSum = 14} λ> mapM_ print $ reverse $ journal $ execLog logStack (push 4 <> fact2) [4] [3] [2,3] [3,2] [2,2] [3,2] [3,3,2] [4,3,2] [4,4,3,2] [5,4,3,2] [3,5,4,3,2] [2,4,3,2] [12,2] [24]
, , . :
f &&& g = \r -> (fr, gr)
λ> let report p = journal $ execLog (logSteps &&& logStackUsed) p λ> report (push 8 <> fact) (Sum {getSum = 48},Max {getMax = 10}) λ> report (push 8 <> fact1) (Sum {getSum = 63},Max {getMax = 4}) λ> report (push 8 <> fact2) (Sum {getSum = 26},Max {getMax = 9}) λ> report (push 8 <> fact3) (Sum {getSum = 43},Max {getMax = 3})
&&&
, . , Haskell . , .
. — , Haskell. .
, , — . , : . ( ) , ( ) . , , . - .
! :
data Code = IF [Code] [Code] | REP [Code] | WHILE [Code] [Code] | PUT Int | GET Int | PUSH Int | POP | DUP | SWAP | EXCH | INC | DEC | NEG | ADD | MUL | SUB | DIV | EQL | LTH | GTH | NEQ deriving (Read, Show)
→ :
fromCode :: [Code] -> Program' a fromCode = hom where hom = foldMap $ \case IF b1 b2 -> branch (hom b1) (hom b2) REP p -> rep (hom p) WHILE tb -> while (hom t) (hom b) PUT i -> put i GET i -> get i PUSH i -> push i POP -> pop DUP -> dup SWAP -> swap EXCH -> exch INC -> inc DEC -> dec ADD -> add MUL -> mul SUB -> sub DIV -> frac EQL -> eq LTH -> lt GTH -> gt NEQ -> neq NEG -> neg
, . foldMap
, . fromCode
, , , c:
λ> stack $ exec (fromCode [PUSH 2, PUSH 5, EXCH, SUB, REP [DUP, INC]]) [5,4,3,2] λ> stack $ exec (fromCode $ read "[PUSH 2, PUSH 5, EXCH, SUB, REP [DUP, INC]]") [5,4,3,2]
→ , case
. : ! Program
:
newtype Program a = Program { getProgram :: ([Code], Action (VM a)) } deriving (Semigroup, Monoid) run = runAction . snd . getProgram
run
, fromCode
:
toCode :: Program' a -> [Code] toCode prog = fst . getProgram $ prog id
, . , :
type Program' a = (Code -> VM a -> VM a) -> Program a program cfp = Program . ([c],) . Action $ \vm -> case status vm of Nothing -> pc . f (stack vm) $ vm _ -> vm programM cfp = Program . ([c],) . Action $ \vm -> case status vm of Nothing -> pc . f (memory vm, stack vm) $ vm _ -> vm
, , , . , -:
none = const id exec prog = run (prog none) (mkVM ()) execLog p prog = run (prog $ \c -> \vm -> addRecord (pc vm) vm) (mkVM mempty) logStack _ vm = [stack vm] logStackUsed _ = Max . length . stack logSteps _ = const (Sum 1) -- logCode c _ = [c] logRun com vm = [pad 10 c ++ "| " ++ pad 20 s ++ "| " ++ m] where c = show com m = unwords $ show <$> toList (memory vm) s = unwords $ show <$> stack vm pad nx = take n (x ++ repeat ' ') debug :: Program' [String] -> String debug = unlines . reverse . journal . execLog logRun
pop = program POP $ \case x:s -> setStack s _ -> err "POP expected an argument." push x = program (PUSH x) $ \s -> setStack (x:s) dup = program DUP $ \case x:s -> setStack (x:x:s) _ -> err "DUP expected an argument." swap = program SWAP $ \case x:y:s -> setStack (y:x:s) _ -> err "SWAP expected two arguments." exch = program EXCH $ \case x:y:s -> setStack (y:x:y:s) _ -> err "EXCH expected two arguments." app1 cf = program c $ \case x:s -> setStack (fx:s) _ -> err $ "operation " ++ show c ++ " expected an argument" app2 cf = program c $ \case x:y:s -> setStack (fxy:s) _ -> err $ "operation " ++ show c ++ " expected two arguments" add = app2 ADD (+) sub = app2 SUB (flip (-)) mul = app2 MUL (*) frac = app2 DIV (flip div) neg = app1 NEG (\x -> -x) inc = app1 INC (\x -> x+1) dec = app1 DEC (\x -> x-1) eq = app2 EQL (\x -> \y -> if (x == y) then 1 else 0) neq = app2 NEQ (\x -> \y -> if (x /= y) then 1 else 0) lt = app2 LTH (\x -> \y -> if (x > y) then 1 else 0) gt = app2 GTH (\x -> \y -> if (x < y) then 1 else 0) proceed p prog s = run (prog p) . setStack s rep body p = program (REP (toCode body)) go none where go (n:s) = if n >= 0 then proceed p (stimes n body) s else err "REP expected positive argument." go _ = err "REP expected an argument." branch br1 br2 p = program (IF (toCode br1) (toCode br2)) go none where go (x:s) = proceed p (if (x /= 0) then br1 else br2) s go _ = err "IF expected an argument." while test body p = program (WHILE (toCode test) (toCode body)) (const go) none where go vm = let res = proceed p test (stack vm) vm in case (stack res) of 0:s -> proceed p mempty s res _:s -> go $ proceed p body s res _ -> err "WHILE expected an argument." vm put i = indexed (PUT i) i $ \case (m, x:s) -> setStack s . setMemory (m // [(i,x)]) _ -> err "PUT expected an argument" get i = indexed (GET i) i $ \(m, s) -> setStack ((m ! i) : s) indexed cif = programM c $ if (i < 0 || i >= memSize) then const $ err "index in [0,16]" else f
, ! , .
-, :
λ> toCode fact1 [PUSH 1,SWAP,WHILE [DUP,PUSH 1,GTH] [SWAP,EXCH,MUL,SWAP,DEC],POP]
EDSL, .
-, , toCode
fromCode
-.
λ> toCode $ fromCode [PUSH 5, PUSH 6, ADD] [PUSH 5, PUSH 6, ADD] λ> exec (fromCode $ toCode (push 5 <> push 6 <> add)) VM {stack = [11], status = Nothing, memory = [0,0,0,0], journal = ()}
, : , . ghci
fact
, , Ctrl+C
. , toCode
, .
, , , - :
λ> putStrLn $ debug (push 3 <> fact) PUSH 3 | 3 | 0 0 0 0 DUP | 3 3 | 0 0 0 0 PUSH 2 | 2 3 3 | 0 0 0 0 LTH | 0 3 | 0 0 0 0 DUP | 3 3 | 0 0 0 0 DEC | 2 3 | 0 0 0 0 DUP | 2 2 3 | 0 0 0 0 PUSH 2 | 2 2 2 3 | 0 0 0 0 LTH | 0 2 3 | 0 0 0 0 DUP | 2 2 3 | 0 0 0 0 DEC | 1 2 3 | 0 0 0 0 DUP | 1 1 2 3 | 0 0 0 0 PUSH 2 | 2 1 1 2 3 | 0 0 0 0 LTH | 1 1 2 3 | 0 0 0 0 PUSH 1 | 1 1 2 3 | 0 0 0 0 MUL | 1 2 3 | 0 0 0 0 MUL | 2 3 | 0 0 0 0 MUL | 6 | 0 0 0 0
. . , , !
, . — . , , .
, : , . , , . !
, , :
listing :: Program' a -> String listing = unlines . hom 0 . toCode where hom n = foldMap f where f = \case IF b1 b2 -> ouput "IF" <> indent b1 <> ouput ":" <> indent b2 REP p -> ouput "REP" <> indent p WHILE tb -> ouput "WHILE" <> indent t <> indent b c -> ouput $ show c ouput x = [stimes n " " ++ x] indent = hom (n+1)
: , , , .
λ> putStrLn . listing $ fact2 INC PUSH 1 SWAP EXCH SUB DUP PUSH 0 GTH IF REP DUP INC : NEG REP DUP DEC DEC DEC REP MUL λ> putStrLn . listing $ gcd1 WHILE EXCH EXCH NEQ EXCH EXCH LTH IF : SWAP EXCH SUB POP
. , , . .
, — , . , . :
a r i t y ( add ) = 2 ▹ 1
a r i t y ( push ) = 0 ▹ 1a r i t y ( pop ) = 1 ▹ 0a r i t y ( exch ) = 2 ▹ 3
When sequentially executing commands, valencies are combined in the following nontrivial way:
(i1▹o1)⋄(i2▹o2)=(a+i1)▹(a+o1+o2−i2),a=max(0,i2−o1).
infix 7 :> data Arity = Int :> Int deriving (Show,Eq) instance Semigroup Arity where (i1 :> o1) <> (i2 :> o2) = let a = 0 `max` (i2 - o1) in (a + i1) :> (a + o1 + o2 - i2) instance Monoid Arity where mempty = 0:>0
And then you can build a homomorphism:
arity :: Program' a -> Arity arity = hom . toCode where hom = foldMap $ \case IF b1 b2 -> let i1 :> o1 = hom b1 i2 :> o2 = hom b2 in 1:>0 <> (i1 `max` i2):>(o1 `min` o2) REP p -> 1:>0 WHILE tb -> hom t <> 1:>0 PUT _ -> 1:>0 GET _ -> 0:>1 PUSH _ -> 0:>1 POP -> 1:>0 DUP -> 1:>2 SWAP -> 2:>2 EXCH -> 2:>3 INC -> 1:>1 DEC -> 1:>1 NEG -> 1:>1 _ -> 2:>1
, , . , , .
( ):
λ> arity (exch <> exch) 2 :> 4 λ> arity fact1 1 :> 1 λ> arity range 2 :> 1 λ> arity (push 3 <> dup <> pow) 0 :> 1
? , "" . Program' a -> Max Int
, . , , :
memoryUse :: Program' a -> Max Int memoryUse = hom . toCode where hom = foldMap $ \case IF b1 b2 -> hom b1 <> hom b2 REP p -> hom p WHILE tb -> hom t <> hom b PUT i -> Max (i+1) GET i -> Max (i+1) _ -> 0
λ> memoryUse fact1 Max {getMax = 0} λ> memoryUse fact3 Max {getMax = 1} λ> memoryUse pow Max {getMax = 2}
. , .
, : , , , 0:>_
. . , .
isReducible p = let p' = fromCode p in case arity p' of 0:>_ -> memoryUse p' == 0 _ -> False reducible = go [] . toCode where go res [] = reverse res go res (p:ps) = if isReducible [p] then let (a,b) = spanBy isReducible (p:ps) in go (a:res) b else go res ps -- Last, , -- spanBy test l = case foldMap tst $ zip (inits l) (tails l) of Last Nothing -> ([],l) Last (Just x) -> x where tst x = Last $ if test (fst x) then Just x else Nothing -- Endo -- intercalate splitOn -- Data.List Data.List.Split reduce p = fromCode . process (reducible p) . toCode $ p where process = appEndo . foldMap (\x -> Endo $ x `replaceBy` shrink x) shrink = toCode . foldMap push . reverse . stack . exec . fromCode replaceBy xy = intercalate y . splitOn x
:
λ> let p = push 6 <> fact1 <> swap <> push 5 <> dup <> push 14 <> gcd1 <> put 1 λ> toCode $ p [PUSH 6,PUSH 1,SWAP,WHILE [DUP,PUSH 1,GTH] [SWAP,EXCH,MUL,SWAP,DEC],POP,SWAP,PUSH 5,DUP,PUSH 14,WHILE [EXCH,EXCH,NEQ] [EXCH,EXCH,LTH,IF [] [SWAP],EXCH,SUB],POP,PUT 1] λ> toCode $ reduce p [PUSH 720,SWAP,PUSH 5,PUSH 1,PUT 1] λ> execLog logSteps (push 8 <> p) VM {stack = [5,8,720], status = Nothing, memory = [0,1,0,0], journal = Sum {getSum = 107}} λ> execLog logSteps (push 8 <> reduce p) VM {stack = [5,8,720], status = Nothing, memory = [0,1,0,0], journal = Sum {getSum = 6}}
107 6.
, , , , - ( ).
: , , , ..? ? , , !
m
VM -> VM
VM -> m VM
, . : " — , ?!" , VM -> m VM
, , , . Haskell >=>
" ". , Action
ActionM
, :
newtype ActionM ma = ActionM { runActionM :: a -> ma } instance Monad m => Semigroup (ActionM ma) where ActionM f <> ActionM g = ActionM (f >=> g) instance Monad m => Monoid (ActionM ma) where mempty = ActionM return
, , >=>
. .
{-# LANGUAGE LambdaCase, GeneralizedNewtypeDeriving, TupleSections #-} import Data.Monoid hiding ((<>)) import Data.Semigroup (Semigroup(..),stimes,Max(..)) import Data.Vector ((//),(!),Vector,toList) import qualified Data.Vector as V (replicate) import Control.Monad import Control.Monad.Identity type Stack = [Int] type Memory = Vector Int memSize = 4 data VM a = VM { stack :: Stack , status :: Maybe String , memory :: Memory , journal :: a } deriving Show mkVM = VM mempty mempty (V.replicate memSize 0) setStack x (VM _ st ml) = return $ VM x st ml setStatus st (VM s _ ml) = return $ VM s st ml setMemory m (VM s st _ l) = return $ VM s st ml addRecord x (VM s st ml) = VM s st m (x<>l) ------------------------------------------------------------ data Code = IF [Code] [Code] | REP [Code] | WHILE [Code] [Code] | PUT Int | GET Int | PUSH Int | POP | DUP | SWAP | EXCH | INC | DEC | NEG | ADD | MUL | SUB | DIV | MOD | EQL | LTH | GTH | NEQ | ASK | PRT | PRTS String | FORK [Code] [Code] deriving (Read, Show) newtype ActionM ma = ActionM {runActionM :: a -> ma} instance Monad m => Semigroup (ActionM ma) where ActionM f <> ActionM g = ActionM (f >=> g) instance Monad m => Monoid (ActionM ma) where ActionM f `mappend` ActionM g = ActionM (f >=> g) mempty = ActionM return newtype Program ma = Program { getProgram :: ([Code], ActionM m (VM a)) } deriving (Semigroup, Monoid) type Program' ma = (Code -> VM a -> m (VM a)) -> Program ma program cfp = Program . ([c],) . ActionM $ \vm -> case status vm of Nothing -> pc =<< f (stack vm) vm m -> return vm programM cfp = Program . ([c],) . ActionM $ \vm -> case status vm of Nothing -> pc =<< f (memory vm, stack vm) vm m -> return vm run :: Monad m => Program ma -> VM a -> m (VM a) run = runActionM . snd . getProgram toCode :: Monad m => Program' ma -> [Code] toCode prog = fst . getProgram $ prog none none :: Monad m => Code -> VM a -> m (VM a) none = const return -- exec :: Program' Identity () -> VM () exec = runIdentity . execM execM :: Monad m => Program' m () -> m (VM ()) execM prog = run (prog none) (mkVM ()) execLog p prog = run (prog $ \c -> \vm -> return $ addRecord (pc vm) vm) (mkVM mempty) f &&& g = \c -> \r -> (fcr, gcr) logStack _ vm = [stack vm] logStackUsed _ = Max . length . stack logSteps _ = const (Sum 1) logCode c _ = [c] logRun com vm = [pad 10 c ++ "| " ++ pad 20 s ++ "| " ++ m] where c = show com m = unwords $ show <$> toList (memory vm) s = unwords $ show <$> stack vm pad nx = take n (x ++ repeat ' ') debug p = unlines . reverse . journal <$> execLog logRun p ------------------------------------------------------------ pop,dup,swap,exch :: Monad m => Program' ma put,get,push :: Monad m => Int -> Program' ma add,mul,sub,frac,modulo,inc,dec,neg :: Monad m => Program' ma eq,neq,lt,gt :: Monad m => Program' ma err m = setStatus . Just $ "Error : " ++ m pop = program POP $ \case x:s -> setStack s _ -> err "pop expected an argument." push x = program (PUSH x) $ \s -> setStack (x:s) dup = program DUP $ \case x:s -> setStack (x:x:s) _ -> err "dup expected an argument." swap = program SWAP $ \case x:y:s -> setStack (y:x:s) _ -> err "swap expected two arguments." exch = program EXCH $ \case x:y:s -> setStack (y:x:y:s) _ -> err "expected two arguments." put i = indexed (PUT i) i $ \case (m, x:s) -> setStack s <=< setMemory (m // [(i,x)]) _ -> err "put expected an argument" get i = indexed (GET i) i $ \(m, s) -> setStack ((m ! i) : s) indexed cif = programM c $ if (i < 0 || i >= memSize) then const $ err "index in [0,16]" else f app1 cf = program c $ \case x:s -> setStack (fx:s) _ -> err $ "operation " ++ show c ++ " expected an argument" app2 cf = program c $ \case x:y:s -> setStack (fxy:s) _ -> err $ "operation " ++ show c ++ " expected two arguments" add = app2 ADD (+) sub = app2 SUB (flip (-)) mul = app2 MUL (*) frac = app2 DIV (flip div) modulo = app2 MOD (flip mod) neg = app1 NEG (\x -> -x) inc = app1 INC (\x -> x+1) dec = app1 DEC (\x -> x-1) eq = app2 EQL (\x -> \y -> if (x == y) then 1 else 0) neq = app2 NEQ (\x -> \y -> if (x /= y) then 1 else 0) lt = app2 LTH (\x -> \y -> if (x > y) then 1 else 0) gt = app2 GTH (\x -> \y -> if (x < y) then 1 else 0) proceed p prog s = run (prog p) <=< setStack s rep body p = program (REP (toCode body)) go none where go (n:s) = if n >= 0 then proceed p (stimes n body) s else err "rep expected positive argument." go _ = err "rep expected an argument." branch br1 br2 p = program (IF (toCode br1) (toCode br2)) go none where go (x:s) = proceed p (if (x /= 0) then br1 else br2) s go _ = err "branch expected an argument." while test body p = program (WHILE (toCode test) (toCode body)) (const go) none where go vm = do res <- proceed p test (stack vm) vm case (stack res) of 0:s -> proceed p mempty s res _:s -> go =<< proceed p body s res _ -> err "while expected an argument." vm ask :: Program' IO a ask = program ASK $ \case s -> \vm -> do x <- getLine setStack (read x:s) vm prt :: Program' IO a prt = program PRT $ \case x:s -> \vm -> print x >> return vm _ -> err "PRT expected an argument" prtS :: String -> Program' IO a prtS s = program (PRTS s) $ const $ \vm -> print s >> return vm fork :: Program' [] a -> Program' [] a -> Program' [] a fork br1 br2 p = program (FORK (toCode br1) (toCode br2)) (const go) none where go = run (br1 p) <> run (br2 p) ------------------------------------------------------------ fromCode :: Monad m => [Code] -> Program' ma fromCode = hom where hom = foldMap $ \case IF b1 b2 -> branch (hom b1) (hom b2) REP p -> rep (hom p) WHILE tb -> while (hom t) (hom b) PUT i -> put i GET i -> get i PUSH i -> push i POP -> pop DUP -> dup SWAP -> swap EXCH -> exch INC -> inc DEC -> dec ADD -> add MUL -> mul SUB -> sub DIV -> frac MOD -> modulo EQL -> eq LTH -> lt GTH -> gt NEQ -> neq NEG -> neg _ -> mempty fromCodeIO :: [Code] -> Program' IO a fromCodeIO = hom where hom = foldMap $ \case IF b1 b2 -> branch (hom b1) (hom b2) REP p -> rep (hom p) WHILE tb -> while (hom t) (hom b) ASK -> ask PRT -> ask PRTS s -> prtS s c -> fromCode [c] fromCodeList :: [Code] -> Program' [] a fromCodeList = hom where hom = foldMap $ \case IF b1 b2 -> branch (hom b1) (hom b2) REP p -> rep (hom p) WHILE tb -> while (hom t) (hom b) FORK b1 b2 -> fork (hom b1) (hom b2) c -> fromCode [c]
: stdin
.
ask, prt :: Program' IO a ask = program ASK $ \case s -> \vm -> do x <- getLine setStack (read x:s) vm prt = program PRT $ \case x:s -> \vm -> print x >> return vm _ -> err "PRT expected an argument" prtS :: String -> Program' IO a prtS s = program (PRTS s) $ const $ \vm -> print s >> return vm
- , :
ioprog = prtS "input first number" <> ask <> prtS "input second number" <> ask <> rep (prt <> dup <> inc) <> prt
λ> exec ioprog input first number 3 input second number 5 3 4 5 6 7 8 VM {stack = [8,7,6,5,4,3], status = Nothing, memory = [0,0,0,0], journal = ()}
, :
fork :: Program' [] a -> Program' [] a -> Program' [] a fork br1 br2 p = program (FORK (toCode br1) (toCode br2)) (const go) pure where go = run (br1 p) <> run (br2 p)
: run
VM -> m VM
, — , , []
, — .
:
λ> stack <$> exec (push 5 <> push 3 <> add `fork` sub) [[8],[2]] λ> stack <$> exec (push 5 <> push 3 `fork` dup <> push 2) [[2,3,5],[2,5,5]]
: (2±3)∗((4±8)±5) :
λ> let pm = add `fork` sub λ> stack <$> exec (push 2 <> push 3 <> push 4 <> push 8 <> pm <> push 5 <> pm <> pm <> mul) [[40],[-28],[20],[-8],[8],[4],[-12],[24]]
:
λ> journal <$> execLog logSteps (push 8 <> fact `fork` fact1 `fork` fact2 `fork` fact3) [Sum {getSum = 48},Sum {getSum = 63},Sum {getSum = 34},Sum {getSum = 43}]
, fork
, , fork
.
. . , /, , .
∗∗∗
- μάγμα . , , , . , , , Lego: , - . , , , .
Lego , , — , , . , , . — ! , . , - . ( -) , . — ! "" , , , , . , , .
Source: https://habr.com/ru/post/429530/
All Articles