
Recently discovered a wonderful article
Monadic Parser Combinator about creating parsers. The basic idea is to describe the parser as an object of the first class, so that you can decompose the original problem and present the desired parser as a synthesis of simpler parsers. To facilitate the combination of parsers, the replace error with list of success approach is used. In the haskell language it is very convenient to work with the creation of parsers, since this task naturally falls on the concept of monads, unfortunately in nemerle monads are not supported, but the language is still powerful enough to cope with this task.
A parser can be presented in more detail as a mapping from a string to a list of tuples, where the first element of the tuple is the disassembled part of the line, and the second element is the rest of the line. In case of unsuccessful parsing, an empty list is returned. In different cases, what I called the parsed part of the string can have a different type, so it is natural to consider the type of the parser as polymorphic, for example,
string ->
list ['a *
string ]. Below this structure will be denoted as a parser ['a].
Next you need to consider a combination of parsers. Suppose you need to parse and calculate “sin42”, and we already have a parser that can parse mathematical functions — math: parser [
double ->
double ], and a parser that can parse numbers — number: parser [
double ] (through: the type of parsers is described ).
The desired parser can be written as follows:
def eval = text => math (text) .Map ((f, rest) => number (rest) .Map ((a, b) => (f (a), b))). Fold ([], (e, acc) => e + acc)
It turned out quite capacious, but only a red-eyed fan of FP can agree that this is a beautiful code. Below I will define an operator on parsers and this code can be rewritten as follows:
def eval = math * (f => number * (x => result (f (x))))
In simple terms, the action of the '*' operator can be described as the creation of a new parser, which acts as follows: the first parser is applied, then the second parser parametrizes the disassembled result of the first and continues parsing. At the input, the '*' operator has a parser ['a] and a function that performs parameterization, such as the' a-> parser ['b].
')
Below, for an example of the simplicity and power of this approach, I have written a parser that checks the correctness of arithmetic expressions consisting of positive numbers, parentheses and the "+" operation. The code is written in nemerle, but I will try to comment on the syntax, try to understand the semantics themselves, but be careful - do not break the brain! And do not be afraid of the size of the code, since half of it can be moved to a separate library to create parsers, in the case of haskell the library based on this approach is parsec. Go!
using System;
using System. Console ;
I connect the necessary namespaces and open the System.Console class - from now on all its static methods become global functions / procedures. About the module, functional types instead of delegates, the lack of new before calling the constructor and the description of the constructor through the word this, I probably wrote before. I note now only that generic parameters in nemerle are set through square brackets, and not through angular ones, that the type name may contain apostrophes and that in the nonmert it is considered good form to give names to generic parameters starting with an apostrophe.
module Program
{
class Parser ['a]
{
public this (core : string -> list ['a* string ]) { this .core = core }
core : string -> list ['a* string ];
The type of the tuple, in which the first element is of type int, and the second string, is written in nemerle as int * string, the tuple itself, if it has not written about it yet, is written, for example:
def tuple = (45, “qwerty”). Now, I hope, the entry list ['a * string] is clear.
public Parse(text : string ) : list ['a* string ] { core(text) }
public static @*['b](self : Parser ['a], parser : 'a-> Parser ['b]) : Parser ['b]
{
Parser ( fun (text : string ) : list['b* string ]
{
def data = self.Parse(text);
data.Map((ast,rest) => parser(ast).Parse(rest)).FoldRight([], (e,acc) => e+acc)
})
}
@ * is not a mate, as my friend thought, but a synonym for operator * in C #. Another syntactic feature that you have not yet met on this blog is fun. This is nothing more than a lambda expression, another possible syntax for lambda is the same as the C # syntax, for example, the following code examples are equivalent: fun (x) {x + 1} and x => x + 1; Optionally, you can specify the type, for example, fun (x: int): int {x + 1} or x: int => x + 1. In the case where the lambda can be written in one line, I use the second option, otherwise the first one.
public static @+(a : Parser ['a], b : Parser ['a]) : Parser ['a]
{
Parser (text => a.Parse(text) + b.Parse(text) )
}
}
In nemerle, operation + for lists is defined, in the code above it was used inside the lambda body.
class zero ['a] : Parser ['a] { public this () { base (x=>[]) } }
class result ['a] : Parser ['a]
{
public this (value : 'a) { base (text => [(value,text)]) }
}
In Nemer, as in the Java language, the call to the base class constructor occurs inside the subclass constructor. The two classes that I defined are fundamental parsers, and many other parsers and parser generators are built on their basis. The result class can be compared with the monad constructor in haskell. I called the parser generator a function that returns a parser, for example, the symbol (x: char) generator is defined below: Parser [char], which creates a parser waiting for the specified symbol at the beginning of the parse line.
Below I define three parser combinators. Since all this code is written inside a module, which is a static class, I can easily declare static methods of this class (the static keyword is not needed in this case, since it is a module) and use them as global functions. I called the parser combinator a function that accepts a parser or parsers at the input, and returns a parser at the output. The trivial combinator is the identical function, but it is not interesting, so I defined the combinators many, many1 and oneOf. The first repeats a parser that has an input, as long as the line it parses it allows (analog * in regular expressions), the second is an analog of + in a regular expression, and the third tries to apply successively the parsers passed to it until one of them won't parse part of the string successfully.
many['a] (parser : Parser ['a]) : Parser [ list ['a]]
{
parser * (x => many(parser) * (xs => result (x::xs))) + result ([])
}
many1['a] (parser : Parser ['a]) : Parser [ list ['a]]
{
parser * (x => many(parser) * (xs => result (x::xs)))
}
oneOf['a] ( params parsers : array [ Parser ['a]]) : Parser ['a]
{
Parser ( fun (text){
parsers.FoldLeft([], fun (e,acc)
{
| (_, []) => e.Parse(text)
| (_, _) => acc
})
})
}
In the last generator, the abbreviated syntax for pattern matching is used in the body of the lambda expression. It can be used in the case when the function arguments are involved in pattern matching and the function body consists of a single pattern matching expression. I will give the equivalent code examples:
def foo (x) {
match (x) {| [] => “Empty” | _ => "! empty"}} and
def foo (x) {| [] => “Empty” | _ => "! empty"}.
Attention!
Below is the already significant code that creates the checking parser. All the code before this can be transferred to the library, since it is the most generalized and not tied to a specific parser.
Main() : void
{
def item = Parser(x => if (x.Length==0) [] else [(x[0],x.Substring(1))] );
def sat(predicate) { item * (x => if (predicate(x)) result (x) else zero ()) }
def symbol(x : char) { sat(y => x==y) }
def digit = sat(x => '0' <= x && x <= '9' );
def number = many1(digit);
def expr()
{
def parentheses =
symbol( '(' ) * (_ =>
expr() * ( _=>
symbol( ')' ) * (_ => result ([]))
)
);
def trivial = oneOf(number, parentheses);
def mul =
trivial * ( _ =>
many1(symbol( '*' ) * ( _ => trivial)) * (_ => result ([]))
);
def sarg = oneOf(mul, trivial);
def sum =
sarg * ( _ =>
many1(symbol( '+' ) * ( _ => sarg)) * (_ => result([]))
);
oneOf(sum,mul,trivial)
}
WriteLine(expr().Parse( "5+42*(8+1)" ));
_ = ReadLine();
}
}
This code should be clear (syntactically), the only thing I missed, it did not explain what the word def does. This word comes from the English define what it means to declare. This is exactly what it does: it allows you to give a name to a value and declare local functions (relative to the class method and other local functions). The first is similar to the behavior of the word var in the C # language, only in the German it is impossible to change the meaning of the name, it is easier to understand it by example:
def x = 1; // right
x = x + 1; // compile error
def x = x + 1; // compilation was successful
mutable y = 1; // mutable - full analogue of var in the C # language
y = y +1; // all is well.
Another note to the parser code: expr is declared as a local function, since its description refers to itself.
PS I hope I was able to adequately retell the ideas of the article and better acquaint the readers with the nemerle language.
