42
x
x + 42
. They are formed from other arithmetic expressions.(x+2)*3
). This is not just a different type of expression, but a different way of parsing an expression.__repr__
method allows to print AST
during debugging. All AST
classes will inherit Equality
, so we will be able to verify the similarity of two AST
objects, which is also useful in testing. from equality import * class Aexp(Equality): pass class IntAexp(Aexp): def __init__(self, i): self.i = i def __repr__(self): return 'IntAexp(%d)' % self.i class VarAexp(Aexp): def __init__(self, name): self.name = name def __repr__(self): return 'VarAexp(%s)' % self.name class BinopAexp(Aexp): def __init__(self, op, left, right): self.op = op self.left = left self.right = right def __repr__(self): return 'BinopAexp(%s, %s, %s)' % (self.op, self.left, self.right)
x < 10
)And
expressions (such as x < 10 and y > 20
)Or
expressionsNot
expressionsand
, or
, or not
expressions are logical expressions. This type distinction helps us avoid meaningless expressions like x < 10 and 30
. class Bexp(Equality): pass class RelopBexp(Bexp): def __init__(self, op, left, right): ... class AndBexp(Bexp): def __init__(self, left, right): ... class OrBexp(Bexp): def __init__(self, left, right): ... class NotBexp(Bexp): def __init__(self, exp): ...
class Statement(Equality): pass class AssignStatement(Statement): def __init__(self, name, aexp): ... class CompoundStatement(Statement): def __init__(self, first, second): ... class IfStatement(Statement): def __init__(self, condition, true_stmt, false_stmt): ... class WhileStatement(Statement): def __init__(self, condition, body): ...
AST
classes and a suitable set of combinators - it's time to write our parser. When writing a parser, it is easier to start with the basic structures of the language, gradually moving to more complex things.keyword
parser. It is just a special version of the Reserved
Combinator using the RESERVED
tag, which is labeled with all keywords. Remember, Reserved
corresponds to a single token, for which the value and tag are the same as for the transmitted ones. def keyword(kw): return Reserved(kw, RESERVED)
keyword
is an actual combinator, because it is a function that returns a parser. We will use it directly in other parsers.id
parser is used for matching variable names. It uses the Tag
combinator, which matches the token with the specified tag. id = Tag(ID)
num
parser is used to match integers. It works almost like id
, except that we use the Process
combinator (more precisely, the operator that calls Process
) to convert the token to a specific integer. num = Tag(INT) ^ (lambda i: int(i))
aexp_value
parser, which will convert the values returned by num
and id
into actual expressions. def aexp_value(): return (num ^ (lambda i: IntAexp(i))) | \ (id ^ (lambda v: VarAexp(v)))
|
which is short for the Alternate
combinator. This allows the first expressions to be parsed. If they fail, an attempt will be made to parse the expression with the variable.aexp_value
as a function with no arguments instead of a global value, just as we did with id
and num
. We will do the same with all remaining parsers. And we did it because we do not want the code of each parser to be calculated immediately. If we defined each parser as global, each parser would not be able to refer to the other parsers that follow in the same source file, because they would not have been declared yet. This would make it impossible to define recursive parsers, and the source code would be less readable.AST
class, they need another parser to process them. def process_group(parsed): ((_, p), _) = parsed return p def aexp_group(): return keyword('(') + Lazy(aexp) + keyword(')') ^ process_group
process_group
function is used with the Process
combinator ( ^
operator). It simply discards the tokens of the parentheses and returns the expression inside. In fact, aexp_group
also a parser. Remember, operator +
is short for Concat
combinator. So she parses '(', following the arithmetic expression (exploded by aexp
, which we will soon define), followed by ')'. You must avoid calling aexp
, because, as aexp
calls aexp_group
, which will lead to infinite recursion. We use the Lazy
combinator, which postpones the aexp call until the parser is applied to any input data.aexp_value
and aexp_group
with aexp_term
. The expression aexp_term
is any simple independent expression in which we do not have to care about the precedence of operators with respect to other expressions. def aexp_term(): return aexp_value() | aexp_group()
aexp
and throw it together with aexp_term
. This will result in the expression: 1 + 2 * 3
(1 + 2) * 3
def process_binop(op): return lambda l, r: BinopAexp(op, l, r)
process_binop
function is what creates the BinopAexp
object. This function accepts any arithmetic operator and returns a function that combines pairs of expressions using this operator ...process_binop
function should be used with the Exp
( *
operator) combinator. Combinator Exp
parses a list of expressions with a separator between each pair of expressions. The left operator Exp
is a parser that matches the individual elements of the list (in our case, arithmetic expressions). The right operator is a parser that maps separators (operators). No matter which separator is matched, the right parser returns a function that, given the correspondence of the operators, returns a union function. The union function accepts parsed expressions to the left and right of the separator, and returns a single, unified expression. Still not confused? We will quickly go through the use of Exp
. The process_binop
function is what the right parser will return. def any_operator_in_list(ops): op_parsers = [keyword(op) for op in ops] parser = reduce(lambda l, r: l | r, op_parsers) return parser aexp_precedence_levels = [ ['*', '/'], ['+', '-'], ]
any_operator_in_list
function takes a list of keyword strings and returns the corresponding parser. We define aexp_precedence_levels
containing a list of operators for each level of precedence (starting with a higher one). def precedence(value_parser, precedence_levels, combine): def op_parser(precedence_level): return any_operator_in_list(precedence_level) ^ combine parser = value_parser * op_parser(precedence_levels[0]) for precedence_level in precedence_levels[1:]: parser = parser * op_parser(precedence_level) return parser
precedence
is the actual content of the operation. Its first argument, value_parser
, is a parser that can read the simple parts of an expression: numbers, variables, and groups. This will be aexp_term
. The precedence_levels
list contains a list of statements, one list for each level. For this we use aexp_precedence_levels
. combine
will accept a function that, passed by the operator, will return a function for constructing one large expression from two small ones. This will be the process_binop
precedence
, we first define op_parser
, which, for a given level of precedence, reads only operators with the same level and returns a function that combines two expressions. op_parser
can be used as the right argument to Exp
. We start by calling Exp
with op_parser
for the highest level of precedence, for these operations must be grouped first. Next we use the resulting parser as an element of the parser (the left argument Exp
) at the next level. After the end of the cycle, the resulting parser is able to parse the arithmetic expression correctly. E<sub>0</sub> = value_parser E<sub>1</sub> = E<sub>0</sub> * op_parser(precedence_levels[0]) E<sub>2</sub> = E<sub>1</sub> * op_parser(precedence_levels[1])
E 0
is the same as value_parser
. It can parse numbers, variables and groups, but not operators. E 1
parsits expressions containing everything that can coincide with E 0
, separated by operators from the first level of precedence. So E 1
can match a * b / c
, but should cause an error as soon as it encounters the +
operator. E 2
can match expressions separated by operators of the next level of precedence. Since we only have 2 levels of precedence, E 2
can match any arithmetic expressions that we support. 4 * a + b / 2 - (6 + c) E<sub>0(4)</sub> * E<sub>0</sub>(a) + E<sub>0</sub>(b) / E<sub>0</sub>(2) - E<sub>0</sub>(6+c) E<sub>1</sub>(4*a) + E<sub>1</sub>(b/2) - E<sub>1</sub>(6+c) E<sub>2</sub>((4*a)+(b/2)-(6+c))
aexp
: def aexp(): return precedence(aexp_term(), aexp_precedence_levels, process_binop)
def process_relop(parsed): ((left, op), right) = parsed return RelopBexp(op, left, right) def bexp_relop(): relops = ['<', '<=', '>', '>=', '=', '!='] return aexp() + any_operator_in_list(relops) + aexp() ^ process_relop
process_relop
function with the Process
combinator. It takes three connected values and creates RelopBexp
from them. In bexp_relop
we parse two arithmetic expressions ( aexp
) separated by a relational operator. And we use our old man - the function any_operator_in_list
, so we don’t need to write a case for each operator. There is also no need to use combinators like Exp
or precedence
, since relationship expressions cannot be connected to each other in IMP as they are done in other languages.not
. The expression is not
a unary operator with high seniority. This makes it easier to parse than and
and or
expressions. def bexp_not(): return keyword('not') + Lazy(bexp_term) ^ (lambda parsed: NotBexp(parsed[1]))
not
keyword with the name of the logical expression (which we will define later). Since bexp_not
will be used to define bexp_term
, we need to use the Lazy
combinator to avoid endless recursion. def bexp_group(): return keyword('(') + Lazy(bexp) + keyword(')') ^ process_group def bexp_term(): return bexp_not() | \ bexp_relop() | \ bexp_group()
bexp_group
and bexp_term
in the same way as for arithmetic equivalents. This is nothing new.and
and or
operators. These operators actually work like arithmetic operators; both are performed from left to right, and and
has the highest level of seniority. bexp_precedence_levels = [ ['and'], ['or'], ] def process_logic(op): if op == 'and': return lambda l, r: AndBexp(l, r) elif op == 'or': return lambda l, r: OrBexp(l, r) else: raise RuntimeError('unknown logic operator: ' + op) def bexp(): return precedence(bexp_term(), bexp_precedence_levels, process_logic)
process_binop
, process_logic
intended for use with the combinator Exp
. It takes an operator and returns a function that combines two subexpressions into one expression using this operator. We substitute this in accordance with the precedence as in aexp
. Writing a common code here pays off, since we don’t have to repeat complex expression expression processing code.aexp
and bexp
, we can begin to parse the IMP statements. Let's start with a modest assignment statement: def assign_stmt(): def process(parsed): ((name, _), exp) = parsed return AssignStatement(name, exp) return id + keyword(':=') + aexp() ^ process
stmt_list
. It will parse a series of statements separated by a semicolon. def stmt_list(): separator = keyword(';') ^ (lambda x: lambda l, r: CompoundStatement(l, r)) return Exp(stmt(), separator)
EXP
combinator here instead of something simpler like stmt() + keyword(';') + stmt()
to avoid left-side recursion.if
: def if_stmt(): def process(parsed): (((((_, condition), _), true_stmt), false_parsed), _) = parsed if false_parsed: (_, false_stmt) = false_parsed else: false_stmt = None return IfStatement(condition, true_stmt, false_stmt) return keyword('if') + bexp() + \ keyword('then') + Lazy(stmt_list) + \ Opt(keyword('else') + Lazy(stmt_list)) + \ keyword('end') ^ process
else
clause is optional. This makes executing a function a little more difficult.while
: def while_stmt(): def process(parsed): ((((_, condition), _), body), _) = parsed return WhileStatement(condition, body) return keyword('while') + bexp() + \ keyword('do') + Lazy(stmt_list) + \ keyword('end') ^ process
stmt
: def stmt(): return assign_stmt() | \ if_stmt() | \ while_stmt()
if
and while
if
, using stmt_list
in the body is better than stmt
. stmt_list
is actually our top level definition. We cannot have stmt
directly dependent on stmt_list
, since this will lead to left-side recursion of the parser, and since we want to have optional statements in the if
and while
bodies, we directly use stmt_list
. def parser(): return Phrase(stmt_list())
parser
will parse the entire program. A program is just a list of statements, but the Phrase
combinator ensures that we use every token in the file before a premature termination due to garbage (invalid) tokens at the end. def imp_parse(tokens): ast = parser()(tokens, 0) return ast
imp_parse
function will be called by the client to parse the program. It accepts the entire list of tokens, calls parser, starts with the first token, and returns the resulting Abstract Syntax Tree (AST). import sys from imp_parser import * if __name__ == '__main__': if len(sys.argv) != 3: sys.stderr.write('usage: %s filename parsername' % sys.argv[0]) sys.exit(1) filename = sys.argv[1] file = open(filename) characters = file.read() file.close() tokens = imp_lex(characters) parser = globals()[sys.argv[2]]() result = parser(tokens, 0) print result
$ echo '1 + 2 * 3' >foo.imp $ python imp_parser_driver.py foo.imp aexp Result(BinopAexp(+, IntAexp(1), BinopAexp(*, IntAexp(2), IntAexp(3))), 5)
Source: https://habr.com/ru/post/208872/
All Articles