Extend C # with Roslyn. Secure calls

Have you ever had the feeling that in the language X, in which you are currently programming, something is missing? Some small, but pleasant bun, which might not make your life absolutely happy, but definitely would add a lot of joyful moments. And here you are with a black envy glancing at the language Y, in which this thing is, sadly sigh and secretly shed tears of impotence at night into your favorite pillow. Happened?

Perhaps, C # gives its adherents fewer reasons for such envy, in comparison with many others, since it is dynamically developing, adding more and more new features that simplify life. And yet, there is no limit to perfection, and for each of us - his own.

Immediately, I note that the priority in this work for me was the desire to try the Roslyn tooth, and the idea itself, which I will describe further, was more likely a pretext and test case for testing this library. However, in the process of studying and implementing, I found out that, albeit with some bubble poles, the result can really be used in practice to really expand the syntax of the language. How to do this, briefly describe at the very end. For now let's get down to it.

Secure calls and the monad Maybe

The idea of safe calls is to get rid of annoying checks of any classes to null, which is a necessity, at the same time, significantly clog the code and worsen its readability. At the same time, there is no desire to be under the constant threat of a NullReferenceException.
')
This problem has been solved in functional programming languages using the Maybe monad , the essence of which is that after boxing, the type used in pipeline calculations may contain some value or the value Nothing. If the previous calculation in the pipeline yielded some result, then the next calculation is performed, but if it returns Nothing, then instead of the next calculation, Nothing is returned.

In C # all conditions for the implementation of this monad are created - null is used instead of Nothing, for structural types their Nullable <T> versions can be used. In principle, the idea is already in the air, and there were several articles that implemented this monad in C # using LINQ. One of them belongs to Dmitry Nesteruk mezastel , there is also another .

But it should be noted that with all the temptation of such an approach, the resulting code using the monad looks very vague, due to the need to use wrappers from lambda functions and LINQ instead of direct calls. However, without the syntactic means of the language, it is hardly possible to implement it more elegantly.

I found the reasonably elegant way to implement this idea in the specification of the not yet created Kotlin language for JDK from the guys from my beloved company JetBrains ( Null-safety ). As it turned out, this is already in Groovy, perhaps somewhere else.

So what is this secure call operator? Suppose we have an expression:

string text = SomeObject.ToString();

In case SomeObject is null, we will inevitably, as already mentioned, get a NullReferenceException. To avoid this, we define in addition to the direct call operator '.' also the safe call operator '?.' which looks like this:

 string text = SomeObject?.ToString();

and is actually an expression:

 string text = SomeObject != null ? SomeObject.ToString() : null;

In the event that a safely invoked method or property returns a structured type, it is necessary that the variable being assigned be of type Nullable.

 int? count = SomeList?.Count;

Like regular calls, such secure calls can be used in chains, for example:

 int? length = SomeObject?.ToString()?.Length;

which is converted to an expression:

 int? length = SomeObject != null ? SomeObject.ToString() != null ? SomeObject.ToString().Length : null : null;

There is some flaw in the transformation I propose here, since it generates additional function calls. In fact, it would be desirable to convert it, for example, to the form:

 var temp = SomeObject; string text = null; if (temp != null) text = temp.ToString();

However, in view of some Roslyn verbosity, in order that the examples would not be too bloated and boring, I decided to make the pre-selection easier. But about this in the following sections.

Project Roslyn

As you may have already heard, a CTP version of the Roslyn project was recently released, under which the developers of the C # and VB languages completely rewritten language compilers using managed code, and opened access to these compilers as an API. With it, developers can do a lot of useful things, for example, it is very convenient and easy to analyze, optimize, generate code, write extensions and code fixes for the studio, and possibly your own DSL. It will come out, however, not soon, right through one version of Visual Studio, but I want to touch it now.

Let us turn to the solution of our problem and first of all imagine how we would like to see the use of this extension of the language in action? Obviously: we write the code, as usual, in our favorite IDE, we use safe call operators where necessary, click Build, during compilation the utility written by us using Project Roslyn converts all this into a syntactically correct C # code and voila, everything is compiled. I hurry to disappoint you - Roslyn does not allow you to interfere with the work of the current csc.exe compiler, which in principle is quite explicable. It is quite likely that if in the same vNext studio the compiler is replaced with its Managed analog, then this possibility will appear. But while it is not.

At the same time, there are already two workarounds:

You can create your own compiler instead of the current csc.exe using the same Roslyn API, and change your build system by replacing csc.exe with your counterpart, including in addition to the default compilation (rather, by the way, just programming) your preliminary conversions code.
You can use your console program as a Pre-Build task, which converts the source code files and saves the received new sources into the Obj folder. WPF is compiled in a very similar way at this point when xaml files in the pre-build phase are converted to .g.cs files.

Project Roslyn provides several types of functionality, but one of the key is the construction, analysis and transformation of an abstract syntax tree. It is this functionality that we will use later.

Implementation

Of course, everything written below is just an example, it suffers from many flaws and cannot be used in reality without significant modifications, but it shows that such things can be done in principle.
Let's turn to implementation. In order to write a program, we first need to install the Roslyn SDK, which is downloaded from the link , you also have to install Service Pack 1 for Visual Studio 2010, and Visual Studio 2010 SDK SP1.
After all these operations, the Roslyn sub-item will appear in the menu for creating new projects, which includes several project templates (some of which can be integrated into the IDE). We will create a simple console application.
For example, we will use the following "source code":

 public class Example { public const string CODE = @"using System; using System.Linq; using System.Windows; namespace HelloWorld { public class TestClass { public string TestField; public string TestProperty { get; set; } public string TestMethod() { return null; } public string TestMethod2(int k, string p) { return null; } public TestClass ChainTest; } public class OtherClass { public void Test() { TestClass test; string testStr1; testStr1 = test?.TestField; string testStr3 = test?.TestProperty; string testStr4 = test?.TestMethod(); string testStr5 = test?.TestMethod2(100, testStr3); var test3 = test?.ChainTest?.TestField; } } }"; }

This source code, with the exception of secure call statements, is not only syntactically correct, but also compiled, although this is not necessary for our transformation.

First of all, it is necessary to build an abstract syntax tree using the source code file. This is done in two accounts:

 SyntaxTree tree = SyntaxTree.ParseCompilationUnit(Example.CODE); SyntaxNode root = tree.Root;

The syntax tree is defined by the SyntaxTree class and, oddly enough, is a tree of nodes inherited from the base SyntaxNode type, each of which represents a certain expression — binary expressions, conditional expressions, method invocation expressions, property definitions, and variables. Naturally, absolutely any C # construct can be mapped by some instance of a descendant class of SyntaxNode. In addition, the SyntaxTree class contains SyntaxToken sets that define parsing source code at the level of minimal syntactic blocks — keywords, literals, identifiers, and punctuation (curly and parentheses, commas, semicolons). Finally, SyntaxTree in contains elements of SyntaxTrivia - those that by and large are not important for understanding the code - spaces and tabs, comments, preprocessor directives, etc.

Here you should know one small detail - Roslyn is very tolerant to parsing files. That is, although in a good way, to parse, it is necessary to submit syntactically correct source code, in fact, it tries to transform absolutely any text in some way into some AST. Including our syntactically incorrect code. We will use this fact. Let's try to build a syntax tree, and find out how Roslyn displays our safe call operator in the tree.

It turns out everything is simple: from the point of view of Roslyn, the expression test? .TestField is a ternary operator with the condition - "test", the expression "when it is true" - ".TestField", and the empty expression "when it is not true". Armed with this information, we will transform our tree. Here we come across one more feature of Roslyn - the syntactic tree constructed by it is immutable, that is, it will not work to correct something directly in the existing structure. But it does not matter. Roslyn suggests using the SyntaxRewriter class for such an operation, which inherits the SyntaxVisitor class, which, as the name implies, implements the well-known Visitor pattern. It contains many virtual methods that process a visit to a node of each specific type (for example, VisitFieldDeclaration, VisitEnumMemberDeclaration, ... there are about 180 of them in total).

We need to create our descendant of the SyntaxRewriter class and override the VisitConditionalExpression method, which is called when the visitor walks around an expression that is a ternary operator. Next, I will give the entire implementation code, especially since it is small and add only a few explanations:

 //              public class SafeCallRewriter : SyntaxRewriter { //          ?. public bool IsSafeCallRewrited { get; set; } protected override SyntaxNode VisitConditionalExpression(ConditionalExpressionSyntax node) { if (IsSafeCallExpression(node)) { // expression  ,   null string identTxt = node.Condition.GetText(); ExpressionSyntax ident = Syntax.ParseExpression(identTxt); // expression  ,      != null string exprTxt = node.WhenTrue.GetText(); exprTxt = exprTxt.Substring(1, exprTxt.Length - 1);//     exprTxt = identTxt + '.' + exprTxt; ExpressionSyntax expr = Syntax.ParseExpression(exprTxt); ExpressionSyntax synt = Syntax.ConditionalExpression(//  condition: Syntax.BinaryExpression(//  ident != null SyntaxKind.NotEqualsExpression, left: ident, //  -   right: Syntax.LiteralExpression(SyntaxKind.NullLiteralExpression)), // null whenTrue: expr, whenFalse: Syntax.LiteralExpression(SyntaxKind.NullLiteralExpression)); IsSafeCallRewrited = true; return synt; } return base.VisitConditionalExpression(node); } //          private bool IsSafeCallExpression(ConditionalExpressionSyntax node) { return node.WhenTrue.GetText()[0] == '.'; } }

I note that my first implementation tried to work only with the logical structure of AST, disdaining the work with the textual representation of expressions, but its complexity very soon began to exceed all conceivable limits. There were only three functions for defining a safe call and its type: for fields and properties, for calling methods, for chains of safe calls, because all this seemed to be different inheritors of the SyntaxNode class, and many more functions for converting various types of safe operators. Exhausted completely, I threw the first option in the trash and the second time I used the convenient GetText and ParseExpression functions provided by Roslyn and some dirty hacks at the row level :).

I also advise you to pay attention to the process of creating a syntax node (in this case, ConditionalExpression) and the pleasantness of using such a C # chip as named parameters in this case. I vouch for it if it were not, in the process of building syntactic nodes one would go crazy.

We now give the code for the main procedure:

 static void Main(string[] args) { //   SyntaxTree tree = SyntaxTree.ParseCompilationUnit(Example.CODE); SyntaxNode root = tree.Root; SafeCallRewriter rewriter = new SafeCallRewriter(); do { rewriter.IsSafeCallRewrited = false; // ,           root = rewriter.Visit(root); } while (rewriter.IsSafeCallRewrited);//        1 maybe- root = root.Format();// Ctrl+K, Ctrl+D Console.WriteLine(root.ToString()); }

I will clarify that several tree overwrites are necessary in order to handle call chains. Of course, this could be done by recursion, but perhaps in this case it would only obscure the code. Also pay attention to the wonderful Format function. It programmatically makes the given stylistic formatting of the code, i.e. adds to the AST all the necessary SyntaxTrivia.

As a result, we have the following code:

 using System; using System.Linq; using System.Windows; namespace HelloWorld { public class TestClass { public string TestField; public string TestProperty { get; set; } public string TestMethod() { return null; } public string TestMethod2(int k, string p) { return null; } public TestClass ChainTest; } public class OtherClass { public void Test() { TestClass test; string testStr1; testStr1 = test != null ? test.TestField : null; string testStr3 = test != null ? test.TestProperty : null; string testStr4 = test != null ? test.TestMethod() : null; string testStr5 = test != null ? test.TestMethod2(100, testStr3) : null; var test3 = test != null ? test.ChainTest != null ? test.ChainTest.TestField : null : null; } } }

So, the first acquaintance with Roslyn was successful, and the prospects for it in general, not necessarily for writing language extensions, are very good. Perhaps, if there are enthusiasts, this could be taken deeper and more serious. In C #, there is still a lot that is missing. :)

PS Another example of this use of Roslyn, which helped me a lot, is here .

Source: https://habr.com/ru/post/133340/

All Articles

Extend C # with Roslyn. Secure calls

Secure calls and the monad Maybe

Project Roslyn

Implementation

More articles: