📜 ⬆️ ⬇️

Prett Parsing - Vaughan Pratt method for parsing expressions

In the topic of compilations and expression calculations.

Back in 1973, Vaughan Pratt proposed a simple and effective method for parsing expressions that does not use automata or grammar as such.

The idea is that each character (token) is endowed with properties:
lbp = character binding priority to the left,
nud = a function that determines the result of applying the operator at the beginning of an expression,
led = a function that determines the result of applying in the middle of an expression.
')
The main analysis is carried out according to the scheme:
 parsing (priority of continuation):
     push character out of input
     result = call nud this character
     while the lbp priority of the next in the continuation priority symbol stream:
         push character out of input
         result = apply the led of this symbol to the current result

Constants and variables take precedence for binding 0, and the nud function returns their value (or reference). Therefore, applying parsing to constants will immediately return their value.
For binary operators, the led function recursively causes a continuation of parsing (on the right) down to a lower priority, and does something with the result already accumulated (on the left), and obtained recursively.
The result of using the operator is aggregated for an external call.
Multi-ary operators get arguments by an additional call to the parse function.
Prefix operators are made using the definition of the nud function for them.
For right-sided binding, the priority of continuing recursive parsing changes.

The site effbot.org provides detailed implementation on python.
There are also links to javascript and schema.

 operators = { #    led '+': [1, lambda left: "(%s + %s)" % (left, parse(1))], '*': [2, lambda left: "(%s * %s)" % (left, parse(2))], '^': [3, lambda left: "(%s ^ %s)" % (left, parse(3))], '$': [0] } def lbp(t): try: return operators[t][0] except KeyError: return 0 def nud(t): return t def led(t,left): return operators[t][1](left) #      ,      . def parse(rbp=0): global tokens tok = tokens.pop(0) result = nud(tok) while lbp(tokens[0]) > rbp: tok = tokens.pop(0) result = led(tok,result) return result def evaluate(expr): global tokens tokens = expr.split(" ") + ['$'] parse() 

Examples of work:
>>> evaluate("a + b * c ^ d * e + f")
a|+,b,*,c,^,d,*,e,+,f,$
= a
+|b,*,c,^,d,*,e,+,f,$
b|*,c,^,d,*,e,+,f,$
= b
*|c,^,d,*,e,+,f,$
c|^,d,*,e,+,f,$
= c
^|d,*,e,+,f,$
d|*,e,+,f,$
= d
= (c ^ d)
= (b * (c ^ d))
*|e,+,f,$
e|+,f,$
= e
= ((b * (c ^ d)) * e)
= (a + ((b * (c ^ d)) * e))
+|f,$
f|$
= f
= ((a + ((b * (c ^ d)) * e)) + f)
>>>
>>> evaluate("a * b + c ^ d + e * f")
a|*,b,+,c,^,d,+,e,*,f,$
= a
*|b,+,c,^,d,+,e,*,f,$
b|+,c,^,d,+,e,*,f,$
= b
= (a * b)
+|c,^,d,+,e,*,f,$
c|^,d,+,e,*,f,$
= c
^|d,+,e,*,f,$
d|+,e,*,f,$
= d
= (c ^ d)
= ((a * b) + (c ^ d))
+|e,*,f,$
e|*,f,$
= e
*|f,$
f|$
= f
= (e * f)
= (((a * b) + (c ^ d)) + (e * f))

Source: https://habr.com/ru/post/50349/


All Articles