Parsing formulas with functions

Good day!

It took a little project. In the project analysis and calculation of mathematical formulas.
Requirements: nested functions, unlimited nesting depth and external variables.

There are a lot of solutions on the Internet, but it's all wrong, or not. Either without formulas, or without variables, or the simplest possibilities like "1+ (2-3) / 4". But most of the answers were in the direction of lexical analysis and the reverse Polish notation. Here I also applied them, taking examples from different sources.
')
First, we analyze the lexical analysis. Because a simple analysis of a formula by symbols with the search for functions, operators, variables, etc., would be extremely unreadable.

The implementation of algorithms can be taken on the Internet and edited to fit your needs.

For lexical analysis made small changes:

load variable list. In the constructor, variables are replaced with their values;
replacing separators of the whole fractional part of the number with the one used in the system;
added unary minus;
deleted unnecessary lexemes for me.

That's what happened. Below is a link to the source.

List of available tokens

enum LexemType { Plus, Minus, Multiply, Divide, UnarMinus, Equals, NotEquals, Greater, Lower, GreaterEquals, LowerEquals, And, Or, LBracket, RBracket, Semicolon, Identifier, Number }

Definition of tokens

 static class LexemDefinitions { public static StaticLexemDefinition[] Statics = new[] { new StaticLexemDefinition("+", LexemType.Plus), new StaticLexemDefinition("-", LexemType.Minus), new StaticLexemDefinition("*", LexemType.Multiply), new StaticLexemDefinition("/", LexemType.Divide), new StaticLexemDefinition("%", LexemType.UnarMinus), new StaticLexemDefinition("==", LexemType.Equals), new StaticLexemDefinition("!=", LexemType.NotEquals), new StaticLexemDefinition(">=", LexemType.GreaterEquals), new StaticLexemDefinition("<=", LexemType.LowerEquals), new StaticLexemDefinition(">", LexemType.Greater), new StaticLexemDefinition("<", LexemType.Lower), new StaticLexemDefinition("&&", LexemType.And), new StaticLexemDefinition("||", LexemType.Or), new StaticLexemDefinition("(", LexemType.LBracket), new StaticLexemDefinition(")", LexemType.RBracket), new StaticLexemDefinition(";", LexemType.Semicolon), }; public static DynamicLexemDefinition[] Dynamics = new[] { new DynamicLexemDefinition("[a-zA-Z_][a-zA-Z0-9_]*", LexemType.Identifier), new DynamicLexemDefinition(@"([0-9]*\.?[0-9]+)", LexemType.Number), }; }

Search and replace unary minus and plus

 private string ReplaceUnarOperator(String src) { return Regex.Replace(Regex.Replace(Regex.Replace(Regex.Replace(src, @"(\(\+)", "("), @"(\A[\+]{1})", ""), @"(\(-)", "(%"), @"(\A[-]{1})", "%"); }

Replacing the unary plus and minus could have been more beautifully implemented, but the writing of regular expressions is not my talent. I am pleased to use your option.

Then he rewrote the implementation of the reverse Polish record taken from the wiki. It was necessary to transfer to the input no longer a string but a list of tokens. This fact slightly simplified the algorithm.

Conversion to HMO

 private static Lexem[] ConvertToPostfixNotation(List<Lexem> _lexems) { List<Lexem> outputSeparated = new List<Lexem>(); Stack<Lexem> stack = new Stack<Lexem>(); foreach (Lexem c in _lexems) { if (operators.Contains(c.Type)) { if ((stack.Count > 0) && (c.Type != LexemType.LBracket)) { if (c.Type == LexemType.RBracket) { Lexem s = stack.Pop(); while (s.Type != LexemType.LBracket) { outputSeparated.Add(s); s = stack.Pop(); } } else { if (GetPriority(c.Type) > GetPriority(stack.Peek().Type)) { stack.Push(c); } else { while (stack.Count > 0 && GetPriority(c.Type) <= GetPriority(stack.Peek().Type)) outputSeparated.Add(stack.Pop()); stack.Push(c); } } } else stack.Push(c); } else outputSeparated.Add(c); } if (stack.Count > 0) foreach (Lexem c in stack) outputSeparated.Add(c); return outputSeparated.ToArray(); }

Calculation of HMO

 private static String getResult(List<Lexem> _lexems) { Stack<Lexem> stack = new Stack<Lexem>(); Lexem[] postfix = ConvertToPostfixNotation(_lexems); Queue<Lexem> queue = new Queue<Lexem>(postfix); Lexem str = queue.Dequeue(); while (queue.Count >= 0) { if (operators.Contains(str.Type)) { Lexem result = new Lexem(); result.Type = LexemType.Number; try { switch (str.Type) { case LexemType.UnarMinus: { Double a = Convert.ToDouble(stack.Pop().Value); result.Value = (-a).ToString(); break; } case LexemType.Plus: { Double b = Convert.ToDouble(stack.Pop().Value); Double a = Convert.ToDouble(stack.Pop().Value); result.Value = (a + b).ToString(); break; } case LexemType.Minus: { Double b = Convert.ToDouble(stack.Pop().Value); Double a = Convert.ToDouble(stack.Pop().Value); result.Value = (a - b).ToString(); break; } case LexemType.Multiply: { Double b = Convert.ToDouble(stack.Pop().Value); Double a = Convert.ToDouble(stack.Pop().Value); result.Value = (a * b).ToString(); break; } case LexemType.Divide: { Double b = Convert.ToDouble(stack.Pop().Value); Double a = Convert.ToDouble(stack.Pop().Value); result.Value = (a / b).ToString(); break; } case LexemType.Equals: { Double b = Convert.ToDouble(stack.Pop().Value); Double a = Convert.ToDouble(stack.Pop().Value); Boolean r = (a == b); result.Value = (r ? 1 : 0).ToString(); break; } case LexemType.NotEquals: { Double b = Convert.ToDouble(stack.Pop().Value); Double a = Convert.ToDouble(stack.Pop().Value); Boolean r = (a != b); result.Value = (r ? 1 : 0).ToString(); break; } case LexemType.Greater: { Double b = Convert.ToDouble(stack.Pop().Value); Double a = Convert.ToDouble(stack.Pop().Value); Boolean r = (a > b); result.Value = (r ? 1 : 0).ToString(); break; } case LexemType.GreaterEquals: { Double b = Convert.ToDouble(stack.Pop().Value); Double a = Convert.ToDouble(stack.Pop().Value); Boolean r = (a >= b); result.Value = (r ? 1 : 0).ToString(); break; } case LexemType.Lower: { Double b = Convert.ToDouble(stack.Pop().Value); Double a = Convert.ToDouble(stack.Pop().Value); Boolean r = (a < b); result.Value = (r ? 1 : 0).ToString(); break; } case LexemType.LowerEquals: { Double b = Convert.ToDouble(stack.Pop().Value); Double a = Convert.ToDouble(stack.Pop().Value); Boolean r = (a <= b); result.Value = (r ? 1 : 0).ToString(); break; } case LexemType.Or: { Boolean b = Convert.ToBoolean(Convert.ToDouble(stack.Pop().Value) > 0 ? 1 : 0); Boolean a = Convert.ToBoolean(Convert.ToDouble(stack.Pop().Value) > 0 ? 1 : 0); Boolean r = (a || b); result.Value = (r ? 1 : 0).ToString(); break; } case LexemType.And: { Boolean b = Convert.ToBoolean(Convert.ToDouble(stack.Pop().Value) > 0 ? 1 : 0); Boolean a = Convert.ToBoolean(Convert.ToDouble(stack.Pop().Value) > 0 ? 1 : 0); Boolean r = (a && b); result.Value = (r ? 1 : 0).ToString(); break; } } } catch (Exception ex) { new InvalidOperationException(ex.Message); } stack.Push(result); if (queue.Count > 0) str = queue.Dequeue(); else break; } else { stack.Push(str); if (queue.Count > 0) { str = queue.Dequeue(); } else break; } } String rValue = stack.Pop().Value; return rValue; }

The peculiarity of logical operations is that if the number is greater than zero, then it becomes True , otherwise False .
Actually, at this stage, the formulas are quite considered. Support functions only not.

At first, I wanted to implement functions through a separate token. But something went wrong ... I took the path of least resistance. In a lexical analysis of a formula, all variables that have been replaced by numeric values become numbers, and everything else that has not been replaced becomes functions. Not logical, but it works.

Further, according to the algorithm, all our lexemes with the new functions are found and placed in the list of functions.

Function class implementation

 internal class Functions { public List<Functions> FunctionList = new List<Functions>(); public List<List<Lexem>> operands = new List<List<Lexem>>(); public Functions(Queue<Lexem> queue) { Queue<Lexem> lexems = new Queue<Lexem>(); Lexem currLextm = queue.Dequeue(); int brackets = 1; while (queue.Count > 0 && brackets > 0) { currLextm = queue.Dequeue(); if (currLextm.Type == LexemType.Identifier) { currLextm.Value = currLextm.Value + "{" + FunctionList.Count.ToString() + "}"; lexems.Enqueue(currLextm); FunctionList.Add(new Functions(queue)); } else { if (currLextm.Type == LexemType.LBracket) { brackets++; } if (currLextm.Type == LexemType.RBracket) { brackets--; } lexems.Enqueue(currLextm); } } List<Lexem> operand = new List<Lexem>(); while (lexems.Count > 0) { currLextm = lexems.Dequeue(); if (currLextm.Type == LexemType.LBracket) { brackets++; } if (currLextm.Type == LexemType.RBracket) { brackets--; } if ((currLextm.Type == LexemType.Semicolon)&&(brackets==0)) { operands.Add(operand); operand = new List<Lexem>(); } else { operand.Add(currLextm); } } operand.Remove(operand.Last()); operands.Add(operand); } }

The function may contain other functions. Everything is entered into the operands of the function, and if a function is there, then a new function is recursively added to its list. Of course, the depth of investment is not limited.

Actually, the recursive algorithm for calculating the function values and replacing it with the result:

Constructor and recursive replacement of a function by calculation results

 public static Dictionary<Int32, NTDClass> NTDs = new Dictionary<Int32, NTDClass>(); public static Dictionary<String, Double> variables = new Dictionary<string,double>(); private List<Functions> FunctionList = new List<Functions>(); private List<List<Lexem>> operands = new List<List<Lexem>>(); private static List<LexemType> operators = new List<LexemType>() { LexemType.Multiply, LexemType.Divide, LexemType.Plus, LexemType.Minus, LexemType.UnarMinus, LexemType.And, LexemType.Or, LexemType.Equals, LexemType.NotEquals, LexemType.Greater, LexemType.Lower, LexemType.GreaterEquals, LexemType.LowerEquals, LexemType.LBracket, LexemType.RBracket }; public PostfixNotationExpression(String formula) { Queue<Lexem> queue = new Queue<Lexem>(new Lexer(formula, variables).Lexems); Lexem currLextm = null; Queue<Lexem> lexems = new Queue<Lexem>(); while (queue.Count > 0) { currLextm = queue.Dequeue(); if (currLextm.Type == LexemType.Identifier) { currLextm.Value = currLextm.Value + "{" + FunctionList.Count.ToString() + "}"; lexems.Enqueue(currLextm); FunctionList.Add(new Functions(queue)); } else { lexems.Enqueue(currLextm); } } List<Lexem> operand = new List<Lexem>(); Int32 brackets = 0; while (lexems.Count > 0) { currLextm = lexems.Dequeue(); if (currLextm.Type == LexemType.LBracket) { brackets++; } if (currLextm.Type == LexemType.RBracket) { brackets--; } if ((currLextm.Type == LexemType.Semicolon) && (brackets == 0)) { operands.Add(operand); operand = new List<Lexem>(); } else { operand.Add(currLextm); } } operands.Add(operand); } public String Calc() { String sep = NumberFormatInfo.CurrentInfo.NumberDecimalSeparator; String res = Calc(FunctionList, operands).First(); res = res == null ? "0.0" : res; res = res.Replace(".", sep).Replace(",", sep); return res; } private static List<String> Calc(List<Functions> fList, List<List<Lexem>> lOps) { List<String> res = new List<String>(); if (fList.Count == 0) { foreach (List<Lexem> op in lOps) { String resOp = getResult(op); res.Add(resOp); } } else { foreach (List<Lexem> op in lOps) { for (int i = 0; i < op.Count; i++) { if (op[i].Type == LexemType.Identifier) { String fName = op[i].Value.Remove(op[i].Value.IndexOf("{")); String fNumStr = op[i].Value.Remove(0, op[i].Value.IndexOf("{") + 1); fNumStr = fNumStr.Remove(fNumStr.IndexOf("}")); Int32 fNum = Int32.Parse(fNumStr); Functions func = fList[fNum]; List<String> funcRes = Calc(func.FunctionList, func.operands); List<Double> rList = new List<double>(); for (int k = 0; k < funcRes.Count; k++) { rList.Add(Double.Parse(funcRes[k])); } switch (fName) { case "NTD": { String val = "0.0"; Int32 key = (Int32)rList[0]; if (NTDs.ContainsKey(key)) { val = NTDs[key].getValue(rList[1]).ToString(); } op[i] = new Lexem() { Type = LexemType.Number, Value = val }; break; } case "If": { op[i] = new Lexem() { Type = LexemType.Number, Value = (rList[0] == 1 ? rList[1] : rList[2]).ToString() }; break; } case "Sqr": { op[i] = new Lexem() { Type = LexemType.Number, Value = (rList[0] * rList[0]).ToString() }; break; } case "Sqrt": { op[i] = new Lexem() { Type = LexemType.Number, Value = (Math.Sqrt(rList[0])).ToString() }; break; } case "Min": { Double Min = Double.MaxValue; for (int k = 0; k < rList.Count; k++) { if (rList[k] < Min) { Min = rList[k]; } } op[i] = new Lexem() { Type = LexemType.Number, Value = Min.ToString() }; break; } case "Max": { Double Max = Double.MinValue; for (int k = 0; k < rList.Count; k++) { if (rList[k] > Max) { Max = rList[k]; } } op[i] = new Lexem() { Type = LexemType.Number, Value = Max.ToString() }; break; } } } } String resOp = getResult(op); res.Add(resOp); } } return res; }

All implemented functions are implemented here. Add something of their own is not something difficult.

For example, I will give a solution to the quadratic equation:

Calculation example

 Dictionary<String, Double> variables = new Dictionary<string, double>(); variables.Add("a", 1); variables.Add("b", 2); variables.Add("c", -27); foreach (var ivar in variables) { textBox1.Text += ivar.Key + " = " + ivar.Value.ToString() + "\r\n"; } textBox1.Text += "--------------\r\n"; // a*x2 + bx + c = 0 String q1 = "(-b+Sqrt(Sqr(b)-4*a*c))/(2*a)"; String q2 = "(-b-Sqrt(Sqr(b)-4*a*c))/(2*a)"; String Formula = "If((b*b-4-ac)>0;" + q1 + "; If((b*b-4-ac)==0; " + q2 + "; 0))"; textBox1.Text += Formula + "\r\n"; textBox1.Text += new PostfixNotationExpression(Formula, variables).Calc() + "\r\n"; Formula = "If((b*b-4-ac)>0;" + q2 + "; If((b*b-4-ac)==0; " + q1 + "; 0))"; textBox1.Text += Formula + "\r\n"; textBox1.Text += new PostfixNotationExpression(Formula, variables).Calc() + "\r\n";

Output

 a = 1 b = 2 c = -27 -------------- If((b*b-4-ac)>0;(-b+Sqrt(Sqr(b)-4*a*c))/(2*a); If((b*b-4-ac)==0; (-b-Sqrt(Sqr(b)-4*a*c))/(2*a); 0)) 4.2915026221292 If((b*b-4-ac)>0;(-b-Sqrt(Sqr(b)-4*a*c))/(2*a); If((b*b-4-ac)==0; (-b+Sqrt(Sqr(b)-4*a*c))/(2*a); 0)) -6.2915026221292

Of course, solving quadratic equations is not a good example. There is no possibility to get several answers in the results, and then if there are no roots, this formula will return zero. But to demonstrate the possibilities will go.

[ Link to project ]

The project has enough space for optimization. For example, there is a duplicate code and the results of all branches of the condition are calculated. And you never know what can be optimized. The main thing is to stop in time.

PS: While writing, added a function to get values from the graph.

Graph function

 public class NTDClass { public String Name; public Dictionary<Double, Double> Graph; public NTDClass(String name, DataTable table) { Graph = new Dictionary<double, double>(); foreach(DataRow row in table.AsEnumerable()) { Double x = Double.Parse(row["Id"].ToString()); Double y = Double.Parse(row["VALUE"].ToString()); if (!Graph.ContainsKey(x)) { Graph.Add(x, y); } else { x += 0.00001; Graph.Add(x, y); } } } public Double getValue(Double b) { Double res = Double.NaN; var xScale = Graph.Keys.AsEnumerable(); Double minX = xScale.OrderBy(x => x).Where(x => x <= b).DefaultIfEmpty(xScale.OrderBy(x => x).First()).LastOrDefault(); Double maxX = xScale.OrderBy(x => x).Where(x => x >= b).DefaultIfEmpty(xScale.OrderBy(x => x).Last()).FirstOrDefault(); Double minY = Graph[minX]; Double maxY = Graph[maxX]; res = (((b - minX) * (maxY - minY)) / (maxX - minX) + minY); return res; } }

Function call

 switch (fName) { case "NTD": { String val = "0.0"; Int32 key = (Int32)rList[0]; if (NTDs.ContainsKey(key)) { val = NTDs[key].getValue(rList[1]).ToString(); } op[i] = new Lexem() { Type = LexemType.Number, Value = val }; break; } ..... ..... .....

Source: https://habr.com/ru/post/246707/

All Articles

Parsing formulas with functions

More articles: