📜 ⬆️ ⬇️

Embedded language for .NET, or as I argued Eric Lippert

Foreword


It happens that an obsession sits so firmly in your head that you return to it again and again for many years. Trying to approach the problem on the other hand, to take advantage of new knowledge, or just to start it all over again with a clean slate - and so on until the question is settled once and for all. For me, programming languages ​​have become such an idea. The very fact that one program allows you to create other programs, in my eyes, endowed it with incomprehensible fractal beauty. Writing such a program yourself was only a matter of time.



For the first time, the time has come after the second course. I was sure that the acquired knowledge of the C language would be enough for me to write the compiler, the virtual machine, and the entire standard library to it alone. The idea was elegant and breathed the romance of youthful maximalism, but instead the result of two years of diligent work was a monstrous thing. Even in spite of the fact that the virtual machine showed signs of life and was able to execute fairly simple scripts in a pseudo-assembler, which helped write fornever comrade, the project was soon abandoned. Instead, it was decided to write a language for the .NET platform in order to get free garbage collection, jit-compiler and all the delights of a huge class library. The compiler was implemented in just six months, the source code was uploaded to CodePlex , and with it I successfully defended my diploma.
')
However, something was still missing. With all its advantages, the language developed for the diploma demanded an explicit declaration of all types and functions, had no support for generics, could not create anonymous functions, and indeed the scope of its application was unclear. The decision to invent another bike came a year later, when I finished the game for Windows Phone and began to think about what to do next. The following requirements were made for the new language:


The aforementioned fornever expressed a desire to participate, and the work began to boil. He took an active part in creating the design of the language and wrote the parser on F #, and I took up the description of the syntax tree and internal infrastructure.

Further in the article I will tell about what happened as a result, what pitfalls met on the way, and why the article has such a yellow title.

Who needs another bike?


Almost three years ago, in one of the topics on Habré, we found out that there are still more than 2500 programming languages ​​in the world. Why would anyone need another one? What in it can be anything that is not in others?

The stunning success of JavaScript and Lua was the reason to make the language embedded, with an emphasis on integration with host applications under .NET. Hence the name of the project - LENS - short for Language for Embeddable .NET Scripting . By "integration" is meant the ability to declare a type or function in a script, as well as the direct exchange of objects between external and embedded programs at runtime. For example, like this:

public void Run() { var source = "a = 1 + 2"; var a = 0; var compiler = new LensCompiler(); compiler.RegisterProperty("a", () => a, newA => a = newA); try { var fx = compiler.Compile(source); fx(); Console.WriteLine("Success: {0}", a); } catch (LensCompilerException ex) { Console.WriteLine("Error: {0}", ex.FullMessage); } } 

As you can see from the example, it is very easy to connect LENS support: just add the assembly to the project's Reference, create an instance and feed the source code to it. All the "magic" is in the RegisterProperty method - with its help, any value from the host program can be available in the script for both reading and writing. For types and functions, there are methods RegisterType and RegisterFunction respectively.

Language features


In terms of syntax, LENS learned a lot from Python and F #. For ten years of working with C-like languages, the semicolon and curly brackets have been set on edge, so here the expressions end with a line break, and the blocks are indented.

Base types

Base types are bool , int , double and string . Constants of these types are written the same way as in C #.

Variable declaration

Variables are declared using the var and let keywords. The first declares the variable being changed, the second is a read-only variable.

 let a = 42 var b = "hello world" 

Control structures

The condition is written using the if block, cycles - using while :

 var a = 1 while(a < 10) if(a % 2 == 0) print "{0} is even" a else print "oops, {0} is odd" a a = a + 1 

Control constructs return value . This means that if can also be used on the right side of the assignment sign:

 let description = if(age < 21) "child" else "grown-up" 

Functions

As can be seen from the example just above, the function call print performed in a functional style: first the name of the function or delegate object, followed by arguments separated by spaces. If you need to pass an expression as an argument that is more complex than a literal or a variable name, it is taken into brackets.

 print "test" print abc print "result is: " (1 + 2) 

To call a function without parameters, a pair of empty brackets is used. The fact is that in the functional paradigm there is no such thing as a “function without parameters”. Tru functionals prefer to operate with pure functions only, and a pure function without arguments is essentially a constant. The pair of empty brackets in this case is a unit literal (synonymous with void ), which means no arguments. Similarly, the constructor without parameters is called.

The function declaration begins with the fun keyword:

 fun launch of bool max:int name:string -> var x = 0 while(x < max) println "{0}..." x x = x - 1 print "Rocket {0} name is launching!" name let rocket = new Rocket () rocket.Success countdown 10 

There is no return keyword in LENS. The return value of the function is its last expression. If the function should not return anything, but the last expression is of some type, the already familiar literal () . The keywords break and continue also not provided.

In the version that we are working on at the moment, the function can be automatically made memorable . To do this, use the pure keyword before the function description. Memoizable functions cache their values ​​in the dictionary: if a function has already been called once with such a set of parameters, its value will be obtained from this dictionary, and not re-calculated:

 pure fun add of int x:int y:int -> print "calculating..." x + y add 1 2 // output add 2 3 // output add 2 3 // no output! 

User structures and algebraic types

Using the record keyword, you can describe the structure and the list of its fields.

 record Point X : int Y : int let zero = new Point () let one = new Point 1 1 

Algebraic types are declared by the type keyword and a list of options that this type can accept. A variant may also have a label of any type:

 type Card Ace King Queen Jack ValueCard of int let king = King let ten = ValueCard 10 print (ten is Card) // true 

For structures, a default constructor and a constructor are created, initializing all fields at once. Also, for the built-in types, the Equals and GetHashCode methods are automatically created, allowing them to be used as keys in dictionaries.

Containers

To initialize frequently used containers, use the special syntax of the new operator:

 let array = new [1; 2; 3; 4; 5] let list = new [[ "hello"; "world" ]] let tuple = new (13; 42.0; true; "test") let dict = new { "a" => 1; "b" => 2 } 

For containers, the most appropriate generic type is automatically displayed. For example:

 let a = new [1; 2; 3.3] // double[] let b = new [King; Queen] // Card[] let c = new [1; true; "hello"] // object[] 

Extension methods

If the corresponding flag is not disabled in the settings, the compiler will also look for suitable extension methods:

 let a = Enumerable::Range 1 10 let sum = a.Product () 

Using a few ingenious syntax is supported LINQ:

 let oddSquareSum = Enumerable::Range 1 100 |> Where ((x:int) -> x % 2 == 0) |> Select ((x:int) -> x ** 2) |> Sum () 

Besides

The compiler has implemented many more interesting things:


So what about Lippert?


Many of those who have read the article already languishing to here for sure are waiting - where is the promised drama? I remember, I remember, but first a lyrical digression.

The compiler backend is the wonderful Reflection.Emit library, part of the .NET Framework. It allows you to create types, methods, fields, and other entities on the fly, and the method code is described using MSIL commands . However, along with ample opportunities, it has a fair amount of annoying pitfalls.

The first problem I encountered was the inability to inspect the types being created:

 var intMethods = typeof(int).GetMethods(); //   var myType = ModuleBuilder.DefineType("MyType"); myType.DefineMethod("Test", MethodAttributes.Public); myType.GetMethods(); // NotSupportedException 

On stackoverflow, I was clearly explained that storing the list of created methods, as well as searching through them, would have to be handled. Laborious, but not difficult.

But further - more.

It turned out that it is impossible to inspect not only created types, but also built-in generic types that use the created ones as parameters ! Here is an example of a class, an attempt to create which on Reflection.Emit will cause a problem:

 class A { public List<A> Values = new List<A>(); } 

It turns out a vicious circle: you can get a List<A> type constructor only when the assembly is already finalized and is no longer needed.

My next question on Stackoverflow was answered by John Skit (author of the book C # in Depth ) and Eric Lippert (until recently the lead developer of C #). Eric's verdict was disappointing and irrevocable:

Reflection.Emit is a real compiler . If you’re not in the least

Reflection.Emit is too weak to build a real compiler on it. It is suitable for “toy” compilation tasks, such as creating dynamic calls or expression trees in LINQ queries, but it will quickly stop being able to solve the problems of a real compiler.

According to Eric, it would be more correct to rewrite the compiler using Common Compiler Infrastructure , but I did not even consider this option. The first decision that came to mind was to exclude from the language the possibility of declaring one's own types, but this would be unsporting. The instinct prompted that there must necessarily be some unobvious way to circumvent this limitation.

And this method really was! It even turned out to be much more obvious than I expected.

As I was told on the same stackoverflow , the TypeBuilder class has static methods that allow you to get a method, field, or property as follows:

 var myType = createType("MyType"); var listType = typeof(List<>); var myList = listType.MakeGenericType(myType); var genericMethod = listType.GetMethod("Add"); var actualMethod = TypeBuilder.GetMethod(myList, genericMethod); 

Here, however, there is a significant drawback: the argument types are not substituted in the returned method. The result will be a handle to the List<MyType>.Add(T item) : the type of the argument will be T (generic parameter), not the expected MyType .

The elimination of this drawback required the implementation of an algorithm that would calculate the values ​​of the argument types from the descriptions of the containing type and the base method, and then put them in the right places. Together with the TypeBuilder methods TypeBuilder these two mechanisms allowed to bypass the vicious circle.

Conclusion - even the great ones sometimes make mistakes, and on Reflection.Emit you can make a full-featured compiler. However, it is necessary how to steam up.

If someone is curious to learn more about the limitations of Reflection.Emit , I advise you to read an MSDN blog article written back in 2009. There are some examples of class topologies that cannot be generated. Beware, examples on VB!

Wonders of Memization


Having stuck into the language of support for memoisation, I suddenly wondered if this practice could not improve the speed of the compiler itself? One of the most commonly used compilers is the TypeDistance function. It calculates the relative distance of inheritance or conversion between the two types, which is required for:


This method contained more than a dozen of various checks and occupied a considerable share of the compilation time. But the distance between the two types does not change with time, so it is quite possible to cache it into a dictionary like Dictionary<Tuple<Type, Type>, int> . Memotization of the three key methods took about half an hour and reduced the compilation time of several complex scripts by about 60 times .

Future of the project


At the moment, the compiler is stable and passes more than two hundred tests. It can already be used in real projects, but this does not mean that the work is completed. The main task is to rewrite the parser from F # to C #. Using the FParsec library to build parsers didn’t justify itself, and it became unbearable to support changes in the grammar. In addition, it provides rather scant opportunities for displaying error messages and drags the entire F # runtime and 500 kilobytes of dependencies. If we consider that all the compiler code takes 250 kb, this is a lot.

For this reason, some features are already implemented in the compiler, but so far not supported in the parser - the slightest changes in the grammar cause an avalanche-like wave of test collapse. Among such "chips" is the for/foreach , the finally section when handling exceptions and the memoization of functions, as well as minor syntax refinements.

The rest of the work is about the following:


While we are working on the project only together, but perhaps there will be like-minded people among the readers - then the work will go faster. In more distant plans - support for the language in Visual Studio and the generation of debugging symbols.

Where can I try?


All source code is available in the githaba repository:

github.com/impworks/lens

There are three test host programs in the project where you can test the operation of the compiler. For their work you will need F # Redistributable . If you have Visual Studio 2010 and older installed, you do not need to install anything.

Collected demos for Windows

Console

The easiest host for the compiler. The program is entered line by line or loaded from a file. To start, you must put the # symbol at the end of the line



Plotter

It allows you to plot a two-dimensional function according to its formula in the form y = f(x) . You can set the range and pitch.


(Pictures are clickable)

Graphic sandbox

The most functional host application. It provides the script with the Circle and Rect types, which can be displayed on the screen and describe the logic of their behavior. Included are several demo scripts.



Total


Still, the project was made more for fun than for solving practical problems. Of course, it may not be useful to anyone and stall, but work on it took me an interesting job for about eight months and made it possible to study the intricacies of the internal structure of the framework, which in itself is great. And if it is useful to someone in real projects - let me know!

Source: https://habr.com/ru/post/184498/


All Articles