📜 ⬆️ ⬇️

Mono.Cecil: make your “compiler”

One of the most luxurious topics for programmers indulging in the invention of bicycles is writing their own languages, interpreters, and compilers. Indeed, a program capable of creating or executing other programs instinctively instills awe in the hearts of coders - because it is difficult, volumetric, but incredibly fascinating.

Most start with their own interpreters, which in general form a huge switch of commands in a loop. Interestingly, freely, but dreary and very slowly. I want something more nimble to JIT'it skillfully and, preferably, itself monitored the memory.

An excellent solution to this problem is choosing .NET as the target platform. Let's leave the lexical analysis for the next time, and today let's try to make the simplest program that creates a working executable:
')



The program will require a name, and output Hello to the console,%% username%.

There are many ways to create an executable, for example:

Just the last option I chose. Unfortunately, I don’t know what Cecil is better for Reflection for this task, but I got an example of Cecil, so I’ll look at it.

Mono.Cecil is a library that allows you to work with an assembly as with an array of bytes. With it, you can both create your own assemblies, and poke around and modify existing ones. It provides a wide range of classes that is (usually) convenient to use.

Subject of conversation


Here, in fact, the finished code (without a description of the class, form, and everything else, except the generator method itself):

using Mono.Cecil; using Mono.Cecil.Cil; public void Compile(string str) { //      ,   :   var name = new AssemblyNameDefinition("SuperGreeterBinary", new Version(1, 0, 0, 0)); var asm = AssemblyDefinition.CreateAssembly(name, "greeter.exe", ModuleKind.Console); //     string  void asm.MainModule.Import(typeof(String)); var void_import = asm.MainModule.Import(typeof(void)); //   Main, , ,  void var method = new MethodDefinition("Main", MethodAttributes.Static | MethodAttributes.Private | MethodAttributes.HideBySig, void_import); //       var ip = method.Body.GetILProcessor(); //  ! ip.Emit(OpCodes.Ldstr, "Hello, "); ip.Emit(OpCodes.Ldstr, str); ip.Emit(OpCodes.Call, asm.MainModule.Import(typeof(String).GetMethod("Concat", new Type[] { typeof(string), typeof(string) }))); ip.Emit(OpCodes.Call, asm.MainModule.Import(typeof(Console).GetMethod("WriteLine", new Type[] { typeof(string) }))); ip.Emit(OpCodes.Call, asm.MainModule.Import(typeof(Console).GetMethod("ReadLine", new Type[] { }))); ip.Emit(OpCodes.Pop); ip.Emit(OpCodes.Ret); //  ,      :    //      var type = new TypeDefinition("supergreeter", "Program", TypeAttributes.AutoClass | TypeAttributes.Public | TypeAttributes.AnsiClass | TypeAttributes.BeforeFieldInit, asm.MainModule.Import(typeof(object))); //     asm.MainModule.Types.Add(type); //     type.Methods.Add(method); //       asm.EntryPoint = method; //     asm.Write("greeter.exe"); } 


Now let's take a closer look at the eerie-looking centerpiece, which actually generates the code.

What is going on there?


Written in C #, the same program would look like this (I’ll omit the class description):

 static public void Main() { Console.WriteLine("Hello, " + "username"); Console.ReadLine(); } 


To do this, we take two lines, the first is a constant, the second is determined at the compilation stage and also becomes a constant, put them on the stack. String.Concat adds these lines and leaves at the top of the stack the result, which is taken by Console.WriteLine and displayed on the screen.

After that, in order for the program not to close before we have time to read something, we require Console.ReadLine () - and since it returns the read string that we don’t need, we throw it out of the stack, then with a sense of accomplishment we leave almost native function Main.

About baytkod


We generate a program for a .NET virtual machine, and the method body obviously consists of its commands. .NET is a stack-based virtual machine, so all operations are performed on operands lying on the stack. A full list of them can be found in Wikipedia , and I will only talk about those that I used in more detail.

LDSTR loads a string onto the stack. Obviously, it needs a string as a parameter. In essence, “loading a string onto the stack” means that it is not the string itself that is put onto the stack, but only a pointer to the place in memory where it is located — but for us, as for an IL programmer, this is not important. The only important thing is that the following teams will be able to take it from there and use it.

CALL , as you might guess from the name, calls the method. To do this, it needs to pass a link to the object with a description of this very method that you must first import. To import, you should “find” a method in a type, passing the name and the list of types of its parameters as an array — that is why the record is so terrible. In an amicable way, here you would have to write some handler that converts a string like “String.Concat (string, string)” to this horror — you can try doing this.

POP throws the top element out of the stack. Nothing special. We need it because Console.ReadLine () returns a value, and our function is void, therefore we cannot leave it there and must clean it up.

RET - from the word return, exits the current function. Must be at the end of each function, and maybe not one - depending on how many points you have out of it.

Work results



In the end, by compiling and running the program, typing your name in there and clicking the heavy Compile button, we get in the same folder the greeter.exe miniature binary, which weighs exactly 2048 bytes.

Launch it, and voila!

Source: https://habr.com/ru/post/109167/


All Articles