Foreword
It happens that an obsession sits so firmly in your head that you return to it again and again for many years. Trying to approach the problem on the other hand, to take advantage of new knowledge, or just to start it all over again with a clean slate - and so on until the question is settled once and for all. For me, programming languages ​​have become such an idea. The very fact that one program allows you to create other programs, in my eyes, endowed it with incomprehensible fractal beauty. Writing such a program yourself was only a matter of time.
For the first time, the time has come after the second course. I was sure that the acquired knowledge of the C language would be enough for me to write the compiler, the virtual machine, and the entire standard library to it alone. The idea was elegant and breathed the romance of youthful maximalism, but instead the result of two years of diligent work was a monstrous thing. Even in spite of the fact that the virtual machine showed signs of life and was able to execute fairly simple scripts in a pseudo-assembler, which helped write
fornever comrade, the project was soon abandoned. Instead, it was decided to write a language for the .NET platform in order to get free garbage collection, jit-compiler and all the delights of a huge class library. The compiler was implemented in just six months, the source code was
uploaded to CodePlex , and with it I successfully defended my diploma.
')
However, something was still missing. With all its advantages, the language developed for the diploma demanded an explicit declaration of all types and functions, had no support for generics, could not create anonymous functions, and indeed the scope of its application was unclear. The decision to invent another bike came a year later, when I finished the
game for Windows Phone and began to think about what to do next. The following requirements were made for the new language:
- Interact with any available .NET types without explicit import
- Generic support
- Support for anonymous functions and closures
- The presence of at least some practical value
The aforementioned
fornever expressed a desire to participate, and the work began to boil. He took an active part in creating the design of the language and wrote the parser on F #, and I took up the description of the syntax tree and internal infrastructure.
Further in the article I will tell about what happened as a result, what pitfalls met on the way, and why the article has such a yellow title.
Who needs another bike?
Almost three years ago, in one of the topics on Habré, we found out that there are
still more than 2500 programming languages ​​in the world. Why would anyone need another one? What in it can be anything that is not in others?
The stunning success of JavaScript and Lua was the reason to make the language embedded, with an emphasis on integration with host applications under .NET. Hence the name of the project -
LENS - short for
Language for Embeddable .NET Scripting . By "integration" is meant the ability to declare a type or function in a script, as well as the direct exchange of
objects between external and embedded programs at runtime. For example, like this:
public void Run() { var source = "a = 1 + 2"; var a = 0; var compiler = new LensCompiler(); compiler.RegisterProperty("a", () => a, newA => a = newA); try { var fx = compiler.Compile(source); fx(); Console.WriteLine("Success: {0}", a); } catch (LensCompilerException ex) { Console.WriteLine("Error: {0}", ex.FullMessage); } }
As you can see from the example, it is very easy to connect LENS support: just add the assembly to the project's Reference, create an instance and feed the source code to it. All the "magic" is in the
RegisterProperty
method - with its help, any value from the host program can be available in the script for both reading and writing. For types and functions, there are methods
RegisterType
and
RegisterFunction
respectively.
Language features
In terms of syntax, LENS learned a lot from Python and F #. For ten years of working with C-like languages, the semicolon and curly brackets have been set on edge, so here the expressions end with a line break, and the blocks are indented.
Base types
Base types are
bool
,
int
,
double
and
string
. Constants of these types are written the same way as in C #.
Variable declaration
Variables are declared using the
var
and
let
keywords. The first declares the variable being changed, the second is a read-only variable.
let a = 42 var b = "hello world"
Control structures
The condition is written using the
if
block, cycles - using
while
:
var a = 1 while(a < 10) if(a % 2 == 0) print "{0} is even" a else print "oops, {0} is odd" a a = a + 1
Control constructs
return value . This means that
if
can also be used on the right side of the assignment sign:
let description = if(age < 21) "child" else "grown-up"
Functions
As can be seen from the example just above, the function call
print
performed in a functional style: first the name of the function or delegate object, followed by arguments separated by spaces. If you need to pass an expression as an argument that is more complex than a literal or a variable name, it is taken into brackets.
print "test" print abc print "result is: " (1 + 2)
To call a function without parameters, a pair of empty brackets is used. The fact is that in the functional paradigm there is no such thing as a “function without parameters”. Tru functionals prefer to operate with
pure functions only, and a pure function without arguments is essentially a constant. The pair of empty brackets in this case is a
unit
literal (synonymous with
void
), which means
no arguments. Similarly, the constructor without parameters is called.
The function declaration begins with the
fun
keyword:
fun launch of bool max:int name:string -> var x = 0 while(x < max) println "{0}..." x x = x - 1 print "Rocket {0} name is launching!" name let rocket = new Rocket () rocket.Success countdown 10
There is no
return
keyword in LENS. The return value of the function is its last expression. If the function should not return anything, but the last expression is of some type, the already familiar literal
()
. The keywords
break
and
continue
also not provided.
In the version that we are working on at the moment, the function can be automatically made
memorable . To do this, use the
pure
keyword before the function description. Memoizable functions cache their values ​​in the dictionary: if a function has already been called once with such a set of parameters, its value will be obtained from this dictionary, and not re-calculated:
pure fun add of int x:int y:int -> print "calculating..." x + y add 1 2
User structures and algebraic types
Using the
record
keyword, you can describe the structure and the list of its fields.
record Point X : int Y : int let zero = new Point () let one = new Point 1 1
Algebraic types are declared by the
type
keyword and a list of options that this type can accept. A variant may also have a label of any type:
type Card Ace King Queen Jack ValueCard of int let king = King let ten = ValueCard 10 print (ten is Card)
For structures, a default constructor and a constructor are created, initializing all fields at once. Also, for the built-in types, the
Equals
and
GetHashCode
methods are automatically created, allowing them to be used as keys in dictionaries.
Containers
To initialize frequently used containers, use the special syntax of the
new
operator:
let array = new [1; 2; 3; 4; 5] let list = new [[ "hello"; "world" ]] let tuple = new (13; 42.0; true; "test") let dict = new { "a" => 1; "b" => 2 }
For containers, the most appropriate generic type is automatically displayed. For example:
let a = new [1; 2; 3.3]
Extension methods
If the corresponding flag is not disabled in the settings, the compiler will also look for suitable extension methods:
let a = Enumerable::Range 1 10 let sum = a.Product ()
Using a few ingenious syntax is supported LINQ:
let oddSquareSum = Enumerable::Range 1 100 |> Where ((x:int) -> x % 2 == 0) |> Select ((x:int) -> x ** 2) |> Sum ()
Besides
The compiler has implemented many more interesting things:
- Constant expressions are evaluated at compile time
- Overrides are supported
- Classical exception handling in the form of try \ catch is supported.
- The generated assembly can be saved in .exe if nothing is imported into it.
- The functions described in the code can be used as extension-methods.
So what about Lippert?
Many of those who have read the article already languishing to here for sure are waiting - where is the promised drama? I remember, I remember, but first a lyrical digression.
The compiler backend is the wonderful
Reflection.Emit library, part of the .NET Framework. It allows you to create types, methods, fields, and other entities on the fly, and the method code is described using
MSIL commands . However, along with ample opportunities, it has a fair amount of annoying pitfalls.
The first problem I encountered was the inability to inspect the types being created:
var intMethods = typeof(int).GetMethods();
On stackoverflow, I was
clearly explained that storing the list of created methods, as well as searching through them, would have to be handled. Laborious, but not difficult.
But further - more.
It turned out that it is impossible to inspect not only created types, but also
built-in generic types that use the created ones as parameters ! Here is an example of a class, an attempt to create which on Reflection.Emit will cause a problem:
class A { public List<A> Values = new List<A>(); }
It turns out a vicious circle: you can get a
List<A>
type constructor only when the assembly is already finalized and is no longer needed.
My
next question on Stackoverflow was answered by John Skit (author of the book
C # in Depth ) and Eric Lippert (until recently the
lead developer of C #). Eric's verdict was disappointing and irrevocable:
Reflection.Emit is a real compiler . If you’re not in the least
Reflection.Emit is too weak to build a real compiler on it. It is suitable for “toy” compilation tasks, such as creating dynamic calls or expression trees in LINQ queries, but it will quickly stop being able to solve the problems of a real compiler.
According to Eric, it would be more correct to rewrite the compiler using
Common Compiler Infrastructure , but I did not even consider this option. The first decision that came to mind was to exclude from the language the possibility of declaring one's own types, but this would be unsporting. The instinct prompted that there must necessarily be some unobvious way to circumvent this limitation.
And this method really was! It even turned out to be much more obvious than I expected.
As I was told on
the same stackoverflow , the
TypeBuilder
class has static methods that allow you to get a method, field, or property as follows:
var myType = createType("MyType"); var listType = typeof(List<>); var myList = listType.MakeGenericType(myType); var genericMethod = listType.GetMethod("Add"); var actualMethod = TypeBuilder.GetMethod(myList, genericMethod);
Here, however, there is a significant drawback: the argument types are not substituted in the returned method. The result will be a handle to the
List<MyType>.Add(T item)
: the type of the argument will be
T
(generic parameter), not the expected
MyType
.
The elimination of this drawback required the implementation of an algorithm that would calculate the values ​​of the argument types from the descriptions of the containing type and the base method, and then put them in the right places. Together with the
TypeBuilder
methods
TypeBuilder
these two mechanisms allowed to bypass the vicious circle.
Conclusion - even the great ones sometimes make mistakes, and on Reflection.Emit
you can make a full-featured compiler. However, it is necessary how to steam up.
If someone is curious to learn more about the limitations of
Reflection.Emit
, I advise you to read
an MSDN blog article written back in 2009. There are some examples of class topologies that cannot be generated. Beware, examples on VB!
Wonders of Memization
Having stuck into the language of support for memoisation, I suddenly wondered if this practice could not improve the speed of the compiler itself? One of the most commonly used compilers is the
TypeDistance
function. It calculates the relative distance of inheritance or conversion between the two types, which is required for:
- Checks on the ability to cast an expression to type
- Finding the most appropriate overload method
- Determining the most appropriate general type for a collection
This method contained more than a dozen of various checks and occupied a considerable share of the compilation time. But the distance between the two types does not change with time, so it is quite possible to cache it into a dictionary like
Dictionary<Tuple<Type, Type>, int>
. Memotization of the three key methods took about half an hour and reduced the compilation time of several complex scripts by
about 60 times .
Future of the project
At the moment, the compiler is stable and passes more than two hundred tests. It can already be used in real projects, but this does not mean that the work is completed. The main task is to rewrite the parser from F # to C #. Using the
FParsec library to build parsers didn’t justify itself, and it became unbearable to support changes in the grammar. In addition, it provides rather scant opportunities for displaying error messages and drags the entire F # runtime and 500 kilobytes of dependencies. If we consider that all the compiler code takes 250 kb, this is a lot.
For this reason, some features are already implemented in the compiler, but so far not supported in the parser - the slightest changes in the grammar cause an avalanche-like wave of test collapse. Among such "chips" is the
for/foreach
, the
finally
section when handling exceptions and the memoization of functions, as well as minor syntax refinements.
The rest of the work is about the following:
- Add pattern matching support
- Add support for object initializers
- Allow generic methods and possibly structures to be declared.
- Add event subscription support
- Describe all features in the documentation.
While we are working on the project only together, but perhaps there will be like-minded people among the readers - then the work will go faster. In more distant plans - support for the language in Visual Studio and the generation of debugging symbols.
Where can I try?
All source code is available in the githaba repository:
github.com/impworks/lensThere are three test host programs in the project where you can test the operation of the compiler. For their work you will need
F # Redistributable . If you have Visual Studio 2010 and older installed, you do not need to install anything.
Collected demos for WindowsConsole
The easiest host for the compiler. The program is entered line by line or loaded from a file. To start, you must put the
#
symbol at the end of the line

Plotter
It allows you to plot a two-dimensional function according to its formula in the form
y = f(x)
. You can set the range and pitch.
(Pictures are clickable)Graphic sandbox
The most functional host application. It provides the script with the Circle and Rect types, which can be displayed on the screen and describe the logic of their behavior. Included are several demo scripts.

Total
Still, the project was made more for fun than for solving practical problems. Of course, it may not be useful to anyone and stall, but work on it took me an interesting job for about eight months and made it possible to study the intricacies of the internal structure of the framework, which in itself is great. And if it is useful to someone in real projects - let me know!