📜 ⬆️ ⬇️

Parsing formulas in 50 lines in Python

Inspiration - the task of the interview with Yandex and the article "Parsing formulas in 40 lines . "

My goal was to see what the “pythonic” solution to this task would look like. It would be desirable, that the decision was simple, the code readable and divided. As a result, there was also an example of the use of a chain of generators (generators pipeline).

Yandex pointed to the classical algorithm for solving the problem in his article - the Sorting Station Algorithm .

The algorithm converts expressions in infix notation, which is familiar to us, into reverse Polish .
The calculation of the expression in reverse Polish notation (ARF) for us is attractive because it has a simple algorithm.
')
The whole algorithm for calculating the expression is divided into three parts:

  1. parsing source string to numbers and operators
  2. use of the marshalling station algorithm to get an expression in arrester
  3. expression evaluation in arrester

At the output of stages 1 and 2 we get arrays of numbers and operators. It is tempting to implement a function as a chain of generators. We will reduce memory consumption by using lazy data processing, where the expression is calculated as numbers and operators arrive.

So let's get started


To begin with, we define the operators in the form of a dictionary, for each character we define the priority and the calculation function.
This dictionary is useful to us also to determine whether a character is an operator:

OPERATORS = {'+': (1, lambda x, y: x + y), '-': (1, lambda x, y: x - y), '*': (2, lambda x, y: x * y), '/': (2, lambda x, y: x / y)} 

We define our function eval_ , to the input of which a string with a calculated expression is supplied:

 def eval_(formula_string): 

Inside the function we define a function and two generators, each of which will perform its part of the work.

1. Source line parser

The generator receives a string as input, returns numbers in the float format, operators and brackets in the character format.

  def parse(formula_string): number = '' for s in formula_string: if s in '1234567890.': #   - ,    number += s elif number: #    ,         yield float(number) number = '' if s in OPERATORS or s in "()": #   -   ,     yield s if number: #      ,   yield float(number) 

2. Sorting Station Algorithm

The generator receives an iterated object from numbers and operators in the infix notation as input, returns the numbers and operators in the reverse Polish record.

  def shunting_yard(parsed_formula): stack = [] #      for token in parsed_formula: #   - ,       , #      , #      . #    ,    - if token in OPERATORS: while stack and stack[-1] != "(" and OPERATORS[token][0] <= OPERATORS[stack[-1]][0]: yield stack.pop() stack.append(token) elif token == ")": #   -  ,     ,   , #      . while stack: x = stack.pop() if x == "(": break yield x elif token == "(": #   -  ,      stack.append(token) else: #   - ,      yield token while stack: yield stack.pop() 

3. Calculator

The function, takes as input an iterated object of numbers and operators in the reverse Polish notation, returns the result of the calculation:

  def calc(polish): stack = [] for token in polish: if token in OPERATORS: #    - , y, x = stack.pop(), stack.pop() #  2    stack.append(OPERATORS[token][1](x, y)) #  ,    else: stack.append(token) return stack[0] #   -     

Finally, we compose a chain of generators to calculate the result of the eval_ function:

  return calc(shunting_yard(parse(formula_string))) 

Speed ​​performance


The main question: “How fast does the program work?” Let's compare our function with the built-in eval function.

On the simplest cases, our function is even faster!

 %timeit eval("2+2") 100000 loops, best of 3: 12.8 µs per loop %timeit eval_("2+2") 100000 loops, best of 3: 7.61 µs per loop 

Expressions are more complicated - 22% longer:

 %timeit eval("15/(7-(1+1))*3-(2+(1+1))") 10000 loops, best of 3: 29.7 µs per loop %timeit eval_("15/(7-(1+1))*3-(2+(1+1))") 10000 loops, best of 3: 36.3 µs per loop 

On expressions it is even more difficult - the gap increases, but still the speed of our function is comparable to the built-in one:

 %timeit eval("15/(7-(1+1))*3-(2+(1+1))*15/(7-(1+1))*3-(2+(1+1))*(15/(7-(1+1))*3-(2+(1+1))+15/(7-(1+1))*3-(2+(1+1)))") 10000 loops, best of 3: 86.3 µs per loop %timeit eval_("15/(7-(1+1))*3-(2+(1+1))*15/(7-(1+1))*3-(2+(1+1))*(15/(7-(1+1))*3-(2+(1+1))+15/(7-(1+1))*3-(2+(1+1)))") 10000 loops, best of 3: 147 µs per loop 

Yes, remembering the name of the article, there are only 50 lines, not forgetting the readability and PEP8!

Function code entirely
 OPERATORS = {'+': (1, lambda x, y: x + y), '-': (1, lambda x, y: x - y), '*': (2, lambda x, y: x * y), '/': (2, lambda x, y: x / y)} def eval_(formula): def parse(formula_string): number = '' for s in formula_string: if s in '1234567890.': number += s elif number: yield float(number) number = '' if s in OPERATORS or s in "()": yield s if number: yield float(number) def shunting_yard(parsed_formula): stack = [] for token in parsed_formula: if token in OPERATORS: while stack and stack[-1] != "(" and OPERATORS[token][0] <= OPERATORS[stack[-1]][0]: yield stack.pop() stack.append(token) elif token == ")": while stack: x = stack.pop() if x == "(": break yield x elif token == "(": stack.append(token) else: yield token while stack: yield stack.pop() def calc(polish): stack = [] for token in polish: if token in OPERATORS: y, x = stack.pop(), stack.pop() stack.append(OPERATORS[token][1](x, y)) else: stack.append(token) return stack[0] return calc(shunting_yard(parse(formula))) 

Source: https://habr.com/ru/post/273253/


All Articles