📜 ⬆️ ⬇️

Inheritance grammars in Sprache (or another customizable expression calculator for .NET)

The article demonstrates the technique of creating parsers using grammar inheritance. Inheritance allows you to describe new grammars based on existing ones by adding new rules or redefining inherited ones, which greatly simplifies the implementation of new parsers. Changes in the basic grammar are automatically available in all generated grammars. The main field of application of such technology is the support of several dialects or versions of languages.

Support for grammar inheritance exists in some parser generators (for example, in ANTLR, Nitra), and is automatically available in tools that use object-oriented languages ​​as DSL languages ​​for describing grammars (for example, Sprache and Irony).

As an example of an application for an article, a custom expression expression calculator library with support for user-defined functions and variables is taken. The calculator compiles strings into LINQ expressions that are easily converted to strongly typed delegates. Unlike interpreting calculators like NCalc, compiled expressions do not differ in any way from methods written in C #. An example of using the finished calculator:

//    var expr = calc.ParseExpression("Sin(y/x)", x => 2, y => System.Math.PI); var func = expr.Compile(); Console.WriteLine("Result = {0}", func()); //   calc.RegisterFunction("Mul", (a, b, c) => a * b * c); expr = calc.ParseExpression("2 ^ Mul(PI, a, b)", a => 2, b => 10); Console.WriteLine("Result = {0}", func.Compile()()); 

')

Sprache Short Review


Sprache is a minimalist functional library for building combinatorial parsers. As the authors of the library say modestly, it occupies an intermediate position between regular expressions and full-fledged parser building tools like ANTLR.

I would say that Sprache is an excellent tool, excellent for a fairly wide range of tasks and has a special appeal as it encourages step-by-step development of grammars and TDD. Of course, combinatorial parsers have certain drawbacks (for example, difficulties in diagnosing and recovering from errors), but such details are irrelevant for the topic of this article.

A parser in Sprache is a function that transforms an input string into some other object. Unlike most compiler building tools, Sprache does not use code generation. Parsers are defined directly in the program text, and you can immediately use them to parse the text. This allows you to write unit tests for them in parallel with the description of the parsers, which is very convenient. Here is an example of a simple parser that takes a line of repeated letters A:

 var parseA = Parse.Char('A').AtLeastOnce(); 

Simple parsers are combined into more complex parsers. For the combination of parsers, Sprache defines the mass of extension-methods (for example, Or, And, Many, etc.), however, the definition of parsers as LINQ queries is particularly impressive:

 Parser<string> identifier = from leading in Parse.WhiteSpace.Many() from first in Parse.Letter.Once() from rest in Parse.LetterOrDigit.Many() from trailing in Parse.WhiteSpace.Many() select new string(first.Concat(rest).ToArray()); var id = identifier.Parse(" abc123 "); Assert.AreEqual("abc123", id); 

The combination of all the rules, or grammar of a language, in Sprache usually looks like a static class with parser fields. For more information about Sprache, you can read in a review article for which there is a translation at habré:

We build DSL on C # using parser combinators

Calculator device


Our calculator can work in three modes: simple, scientific and customizable.

A simple calculator supports regular arithmetic operations on floating-point numbers, unary minus and parentheses. Scientific mode adds support for binary and hexadecimal numbers, exponential notation and calls to any functions from the System.Math class, and in custom mode, you can use parameters and register your own functions (with the possibility of overloading).

Each following mode supports all features of previous modes and adds new ones. In the same way, a hierarchy of classes of grammars will be arranged that describe the input expressions languages ​​of the calculator. The calculator parser is a function that converts the input string to a LINQ expression, which can be compiled into a delegate and called as a normal function:

 var expr = "4*(1/1-1/3+1/5-1/7+1/9-1/11+1/13-1/15+1/17-1/19+10/401)"; var func = calc.ParseExpression(expr).Compile(); var result = func(); 

Simple calculator


An example from the delivery of Sprache, the ultra-compact LinqyCalculator, is taken as the basis for a simple calculator. The grammar is broken down into rules so as to simplify the creation of LINQ expressions at compile time:

 Expr ::= Term ("+"|"-" Term)* Term ::= InnerTerm ("*"|"/"|"%" InnerTerm)* InnerTerm ::= Operand ("^" Operand) Operand ::= NegativeFactor | Factor NegativeFactor ::= "-" Factor Factor ::= "(" Expr ")" | Constant Constant ::= Decimal 

Usually Sprache parsers are declared as static lambda functions. This does not suit us, because they cannot be redefined in descendant classes, so the rules will be declared as virtual properties.

 // : public static readonly Parser<Expression> ExpressionInParentheses = from lparen in Parse.Char('(') from expr in Expr from rparen in Parse.Char(')') select expr; // : protected virtual Parser<Expression> ExpressionInParentheses { get { return from lparen in Parse.Char('(') from expr in Expr from rparen in Parse.Char(')') select expr; } } 

After this alteration, the grammar slightly increases in size, but now any rules can be redefined in descendant classes. In order to write unit tests for each rule, you will have to declare parser properties as public or protected internal.

I will not give the full text of the grammar of a simple calculator, it can be viewed on the githaba . In its content, he practically repeats the standard example of LinqyCalculator from the delivery of Sprache.

Scientific Calculator


Since the scientific calculator can at least be the same as the usual, its class is inherited from the grammar of a simple calculator. To support binary and hexadecimal numbers, add new rules:

 protected virtual Parser<string> Binary { get { return Parse.IgnoreCase("0b").Then(x => Parse.Chars("01").AtLeastOnce().Text()).Token(); } } protected virtual Parser<string> Hexadecimal { get { return Parse.IgnoreCase("0x").Then(x => Parse.Chars("0123456789ABCDEFabcdef").AtLeastOnce().Text()).Token(); } } 

Defining new rules is not enough, because the basic grammar does not know at what point they can be applied. Since binary and hexadecimal numbers are a kind of constant, we add them to the Constant parser.

Note: the Binary and Hexadecimal parsers return a string, and the Constant parser returns a LINQ expression. You will need helper methods that convert strings to Expression.Constant (double). Constant parser ready with support for decimal, binary and hexadecimal numbers will look like this:

 protected override Parser<Expression> Constant { get { return Hexadecimal.Select(x => Expression.Constant((double)ConvertHexadecimal(x))) .Or(Binary.Select(b => Expression.Constant((double)ConvertBinary(b)))) .Or(base.Constant); } } 

To support functions in expressions, two more rules are needed:

 protected virtual Parser<string> Identifier { get { return Parse.Letter.AtLeastOnce().Text().Token(); } } protected virtual Parser<Expression> FunctionCall { get { return from name in Identifier from lparen in Parse.Char('(') from expr in Expr.DelimitedBy(Parse.Char(',').Token()) from rparen in Parse.Char(')') select CallFunction(name, expr.ToArray()); } } 

The CallFunction helper method simply generates a LINQ expression to call a static one from the System.Math class with the specified name:

 protected virtual Expression CallFunction(string name, Expression[] parameters) { var methodInfo = typeof(Math).GetMethod(name, parameters.Select(e => e.Type).ToArray()); if (methodInfo == null) { throw new ParseException(string.Format("Function '{0}({1})' does not exist.", name, string.Join(",", parameters.Select(e => e.Type.Name)))); } return Expression.Call(methodInfo, parameters); } 

Since the basic grammar knows nothing about the new rules, you need to include them in some rule of the basic grammar. Here, finding the right rule is not as easy as in the case of constants. The appropriate rule will be determined by the priority of the function call operation.

It is easy to see that this priority should be the highest - the same as the operation of brackets. For example, when calculating the expression Sin (2) ^ Cos (3), you first need to calculate the values ​​of the functions, and then perform the exponentiation operation.

In basic grammar, parenthesis appears in the Factor rule, so we need to override it:

 protected override Parser<Expression> Factor { get { return base.Factor.XOr(FunctionCall); } } 

Add custom functions


For the most sophisticated version of the calculator, let's create a new class based on a scientific calculator. Adding custom functions obviously will not require the introduction of new grammar rules, because the syntax of expressions remains the same. Only the method that calls functions is changed. Pseudocode:

 override Expression CallFunction(string name, Expression[] parameters) {       name, {         parameters; } //       System.Math return base.CallFunction(name, parameters); } 

Any custom function for a calculator can be represented as a delegate Func <double [], double>. Named functions are conveniently stored in the dictionary: Dictionary <string, Func <double [], double >>. To allow overloading of functions, it’s enough to attach the number of parameters to the name:

 "Min:2" —  Min    "Min:5" —  Min    

As a result, the above pseudocode will turn into something like this:

 protected override Expression CallFunction(string name, Expression[] parameters) { //     var mangledName = name + ":" + parameters.Length; if (CustomFuctions.ContainsKey(mangledName)) { return Expression.Call(...); //      } //   System.Math return base.CallFunction(name, parameters); } 

A certain complexity is the expression Expression.Call, which must be generated to call a user-defined function. The point is that Expression.Call can call only existing methods, which obviously do not include user functions. To get out in this situation, it is enough to define the following method in the calculator class:

 protected virtual double CallCustomFunction(string mangledName, double[] parameters) { return CustomFuctions[mangledName](parameters); } 

This method will call the expression Expression.Call, which we will form when compiling. We will only need to convert the list of parameters into a single array parameter:

 protected override Expression CallFunction(string name, Expression[] parameters) { //     var mangledName = name + ":" + parameters.Length; if (CustomFuctions.ContainsKey(mangledName)) { //   var newParameters = new List<Expression>(); newParameters.Add(Expression.Constant(mangledName)); newParameters.Add(Expression.NewArrayInit(typeof(double), parameters)); //  this.CallCustomFunction(mangledName, double[]); var callCustomFunction = new Func<string, double[], double>(CallCustomFunction).Method; return Expression.Call(Expression.Constant(this), callCustomFunction, newParameters.ToArray()); } //   System.Math return base.CallFunction(name, parameters); } 

Adding Parameters


To support the parameters, a grammar refinement will be required: a new rule and an update of the old rules. A parameter is simply an identifier that can occur in the same place as a constant or function call:

 protected virtual Parser<Expression> Parameter { get { return Identifier; } } protected override Parser<Expression> Factor { get { return Parameter.Or(base.Factor); } } 

Here we meet a conflict for the first time. The fact is that in the Factor rule there are now two alternatives that both begin with an identifier: a parameter and a function. If the parser meets an identifier, it cannot determine the parameter in front of it or the function until it looks ahead. If the identifier is followed by the bracket "(", then this is a function, otherwise - a parameter.

As far as I know, Sprache does not help in the search for such conflicts. Find them only by gazing. You add rules, write unit tests for them, and at one point you discover that some tests do not pass, reporting parsing errors. The case of parameters and functions is quite trivial, but more often the search for and the elimination of conflicts is a serious task that takes a lot of time and effort.

To resolve the conflict between parameters and functions, we can define a parameter as "Identifier, not followed by a bracket". Such a rule will not lead to conflict, since it eliminates ambiguity. It looks like this:

 protected virtual Parser<Expression> Parameter { get { // identifier not followed by a '(' is a parameter reference return from id in Identifier from n in Parse.Not(Parse.Char('(')) select GetParameterExpression(id); } } 

The Parse.Not parser is similar to negative lookahead in regular expressions: it does not change the pointer of the current character and works successfully if the parser passed to it, in this case Parse.Char ('('), fails.

As in the case of calling functions, we need to somehow generate an expression that returns the value of the parameter. It is time to decide how parameters will be passed to the calculator. At first glance, we can deal with the parameters in the same way as with user-defined functions: register them in a special Dictionary <string, double>, which is stored in the calculator:

 calc.Parameters["MyPI"] = 355/113d; calc.Parameters["MyE"] = 2.718d; 

Compilation of a call to such a parameter will be arranged in the same way as calling a user-defined function. The calculator will generate a call to the GetParameterExpression method, passing it the name of the parameter. If the parameter is not defined, you can try to find it among the constants of the System.Math class:

 protected virtual Expression GetParameterExpression(string id) { //    ,    if (Parameters.ContainsKey(id)) { //  this.GetParameterValue(id) var getParameterValue = new Func<string, double>(GetParameterValue).Method; return Expression.Call(Expression.Constant(this), getParameterValue, Expression.Constant(id)) as Expression; } //         System.Math var systemMathConstants = typeof(System.Math).GetFields(BindingFlags.Public | BindingFlags.Static); var constant = systemMathConstants.FirstOrDefault(c => c.Name == id); if (constant == null) { throw new ParseException(string.Format("Parameter or constant '{0}' does not exist.", id)); } //    System.Math return Expression.Constant(constant.GetValue(null)); } 

Trying to use such a calculator, we immediately find the inconvenience of such a storage of parameters. Expressions calculator can compile a lot, and he has one parameter storage. All expressions will use the same parameter pool bound to the calculator instance.

 calc.Parameters["P"] = 3.14d; calc.Parameters["R"] = 10; var func1 = calc.ParseExpression("2*P*R").Compile(); var result1 = func1(); var func2 = calc.ParseExpression("R+P").Compile(); calc.Parameters["P"] = 123; var result2 = func2(); 

The common pool of parameters leads to the fact that expressions cannot be used in a multi-threaded program. One stream will set one parameter value, the other stream will set another, and the result of the calculations will become undefined. Obviously, for the transfer of parameters you need to come up with a mechanism more reliable.

Another way to pass parameters


It will be logical to bind the list of parameters not to the calculator, but to the expression. To do this, you need to change the type of the calculator result expression: instead of Expression <Func>, you will need to generate Expression <Func <Dictionary <string, double>, double >>. Here is what it might look like:

 var function = calc.ParseFunction("y/x").Compile(); var parameters = new Dictionary<string, double>{ { "x", 2 }, { "y", 4 } }; var result = function(parameters); 

For such a scheme would require not such a big alteration. Instead of calling this.GetParameterValue, you only need to generate a call to the parameters dictionary: parameters [name]. The indexer in C # is compiled into a call to the get_Item method, so access to the parameter will look like this:

 var getItemMethod = typeof(Dictionary<string, double>).GetMethod("get_Item"); return Expression.Call(ParameterExpression, getItemMethod, Expression.Constant(name)); 

In order not to complicate the expression, we will not check whether there is a parameter with the specified name in the dictionary. If there is no parameter, the Dictionary class itself will complain about it. Here is the complete method for compiling parameters:

 protected virtual Expression GetParameterExpression(string name) { // try to find a constant in System.Math var systemMathConstants = typeof(System.Math).GetFields(BindingFlags.Public | BindingFlags.Static); var constant = systemMathConstants.FirstOrDefault(c => c.Name == name); if (constant != null) { // return System.Math constant value return Expression.Constant(constant.GetValue(null)); } // return parameter value: Parameters[name] var getItemMethod = typeof(ParameterList).GetMethod("get_Item"); return Expression.Call(ParameterExpression, getItemMethod, Expression.Constant(name)); } 

Syntactic sugar for parameters


At the beginning of the article there is an example of using a calculator to calculate an expression with parameters:

 var expr = calc.ParseExpression("Sin(y/x)", x => 2, y => System.Math.PI); 

Such syntax is read much better than creating and filling in Dictionary <string, double>. It is convenient to use it when the list of valid parameters of the expression is fixed. Although this has nothing to do with expression parsing itself, I’ll explain how this method works:

 public Expression<Func<double>> ParseExpression(string text, params Expression<Func<double, double>>[] parameters) { var paramList = new Dictionary<string, double>(); foreach (var p in parameters) { var paramName = p.Parameters.Single().Name; var paramValue = p.Compile()(0); paramList[paramName] = paramValue; } return ParseExpression(text, paramList); } 

Two words about unit tests


When developing parsers on Sprache, it is very convenient to write unit tests in parallel with the development of grammars. Added a new parser - immediately wrote a set of tests to it (TDD supporters will do it in the reverse order). Since the Sprache library does not analyze the grammar in any way, it cannot report problems like conflicts or left recursion (although it can track simple left recursion at the execution stage), and the unit test suite becomes the only support.

Inheriting grammars adds additional responsibility to unit tests: for each class, you need to make sure that all the inherited rules continue to work and interact normally with the rules redefined in descendant classes. For this, you can use the ForEachCalculator helper method, which runs the tests on all versions of the calculator:

 private void ForEachCalculator(Action<SimpleCalculator> fact) { foreach (var calc in new[] { new SimpleCalculator(), new ScientificCalculator(), new XtensibleCalculator() }) { fact(calc); } } [Fact] public void ExprCombinesTermsWithAddSubOperators() { ForEachCalculator(calc => { Assert.Equal(4d, calc.Expr.Parse("2+2").Execute()); Assert.Equal(2d, calc.Expr.Parse("2*3-4*1").Execute()); Assert.Throws<ParseException>(() => calc.Expr.Parse("+")); }); } 

However, a more elegant solution, which is used in the unit tests of the calculator, is the inheritance of tests. In the base class of tests, the virtual method CreateCalculator is defined, which creates a calculator for testing. The base class of tests creates and tests the SimpleCalculator, its descendant is created by the ScientificCalculator tests, inheriting all the tests of the base class, but executing them for the descendant calculator, and so on.

Summary


So, we received three versions of the calculator, differing in a set of possibilities and syntax of input expressions. Due to inheritance, most of the basic grammar rules are reused in descendant grammars, which makes it possible to focus the development on differences from the basic version. The full source code for the calculator project is available on github .

Grammar inheritance is a powerful technique, which in many cases allows keeping the grammar complexity under control and speeding up parser development. In the case of the Sprache library, grammar inheritance seems to be a good tool, additionally encouraging decomposition and step-by-step development in parallel with unit testing.

Source: https://habr.com/ru/post/228037/


All Articles