📜 ⬆️ ⬇️

Fast and convenient IL generation

Many times I came across the task of dynamically generating code (for example, when writing an effective serializer or DSL compiler). This can be done in different ways, which one is the best - a discussion for a separate article. For a number of reasons, I prefer Reflection.Emit and CIL (Common Intermediate Language) and tell you what problems I had to face on this path, as well as about their solution: the smart wrapper over ILGenerator - GroboIL from Graceful Emit library.

I want to note at the same time that sometimes there are situations when we do not have much choice: for example, when writing a serializer, you must have access to private fields, and you have to use IL. By the way, the famous protobuf-net serializer contains several hundreds of IL instructions.

If you have never encountered the use of IL-code, then the article may seem difficult to understand, because it contains many examples of code using IL. For basic knowledge, I recommend reading the article Introduction to IL Assembly Language .

Reflection.Emit provides two methods for generating code - DynamicMethod and TypeBuilder / MethodBuilder .
')
DynamicMethod is a “lightweight” static method that will compile a delegate. Their main advantage is that DynamicMethod 's are allowed to ignore the visibility of types and type members. They are collected by the garbage collector when all references to them are dropped, but with .NET Framework 4.0, DynamicAssembly has the same opportunity, so this is no longer an advantage.

With DynamicAssembly / ModuleBuilder / TypeBuilder / MethodBuilder, you can dynamically generate the entire .NET type space: interfaces, classes, redefine virtual methods, declare fields, properties, implement constructors, etc. disk.

In practice, DynamicMethod's are more commonly used, since they are somewhat simpler in the announcement and have access to private members. MethodBuilders are usually used if, in addition to the code, there is a need to generate some data: then they are conveniently placed in TypeBuilders , and the code is in their methods.

Example


Task: print all fields of the object.

public static Action<T> BuildFieldsPrinter<T>() where T : class { var type = typeof(T); var method = new DynamicMethod(Guid.NewGuid().ToString(), //   typeof(void), //   new[] {type}, //   typeof(string), //     ,  , , string true); //      var il = method.GetILGenerator(); var fieldValue = il.DeclareLocal(typeof(object)); var toStringMethod = typeof(object).GetMethod("ToString"); var fields = type.GetFields(BindingFlags.Instance | BindingFlags.Public | BindingFlags.NonPublic); foreach(var field in fields) { il.Emit(OpCodes.Ldstr, field.Name + ": {0}"); // stack: [format] il.Emit(OpCodes.Ldarg_0); // stack: [format, obj] il.Emit(OpCodes.Ldfld, field); // stack: [format, obj.field] if(field.FieldType.IsValueType) il.Emit(OpCodes.Box, field.FieldType); // stack: [format, (object)obj.field] il.Emit(OpCodes.Dup); // stack: [format, obj.field, obj.field] il.Emit(OpCodes.Stloc, fieldValue); // fieldValue = obj.field; stack: [format, obj.field] var notNullLabel = il.DefineLabel(); il.Emit(OpCodes.Brtrue, notNullLabel); // if(obj.field != null) goto notNull; stack: [format] il.Emit(OpCodes.Ldstr, "null"); // stack: [format, "null"] var printedLabel = il.DefineLabel(); il.Emit(OpCodes.Br, printedLabel); // goto printed il.MarkLabel(notNullLabel); il.Emit(OpCodes.Ldloc, fieldValue); // stack: [format, obj.field] il.EmitCall(OpCodes.Callvirt, toStringMethod, null); // stack: [format, obj.field.ToString()] il.MarkLabel(printedLabel); var writeLineMethod = typeof(Console).GetMethod("WriteLine", new[] { typeof(string), typeof(object) }); il.EmitCall(OpCodes.Call, writeLineMethod, null); // Console.WriteLine(format, obj.field.ToString()); stack: [] } il.Emit(OpCodes.Ret); return (Action<T>)method.CreateDelegate(typeof(Action<T>)); } 


ILGenerator problems


To begin with, ILGenerator has a bad syntax: there is one Emit method with a bunch of overloads, so it's easy to mistakenly cause an incorrect overload.

It is also inconvenient that one logical IL-instruction may have several options, for example, the ldelem instruction has 11 options - ldelem.i1 (sbyte), ldelem.i2 (short), ldelem.i4 (int), ldelem.i8 (long ), ldelem.u1 (byte), ldelem.u2 (ushort), ldelem.u4 (uint), ldelem.r4 (float), ldelem.r8 (double), ldelem.i (native int), ldelem.ref (reference type).

But these are all seeds compared to how badly the error messages are generated.

First, the exception only crashes at the very end, when the JIT compiler tries to compile the method (that is, not even on the DynamicMethod . CreateDelegate () or TypeBuilder . CreateType () call , but when you first try to actually run this code), it’s not clear which instruction caused the error.

Secondly, the error messages themselves, as a rule, do not speak about anything, for example, the most frequent error is “Common language runtime detected an invalid program”.

Examples of errors / typos



  1.  var il = dynamicMethod.GetILGenerator(); {..} //  -  il.Emit(OpCodes.Ldfld); //   ,    FieldInfo {..} //  -  var compiledMethod = dynamicMethod.CreateDelegate(..); compiledMethod(..); // ←    

    InvalidProgramException: "Common language runtime detected an invalid program".


  2.  var il = dynamicMethod.GetILGenerator(); {..} //  -  il.Emit(OpCodes.Box); //   value type  object,     {..} //  -  var compiledMethod = dynamicMethod.CreateDelegate(..); compiledMethod(..); // ←    

    InvalidProgramException: "Common language runtime detected an invalid program".


  3.  var il = dynamicMethod.GetILGenerator(); {..} //  -  var code = GetCode(..); //   byte il.Emit(OpCodes.Ldc_I4, code); //     int,   byte {..} //  -  var compiledMethod = dynamicMethod.CreateDelegate(..); compiledMethod(..); // ←    

    InvalidProgramException: "Common language runtime detected an invalid program".


  4.  var il = dynamicMethod.GetILGenerator(); {..} //  -  il.Emit(OpCodes.Call, abstractMethod); //    ,    Callvirt  Call {..} //  -  var compiledMethod = dynamicMethod.CreateDelegate(..); compiledMethod(..); // ←    

    BadImageFormatException: "Invalid il format".


  5.  var il = dynamicMethod.GetILGenerator(); {..} //  -  var keyGetter = typeof(KeyValuePair<int, int>).GetProperty("Key").GetGetMethod(); il.Emit(OpCodes.Ldarg_1); //  1 – KeyValuePair<int, int> il.Emit(OpCodes.Call, keyGetter); //    Key  KeyValuePair<int, int>,   value type, //      ,    {..} //  -  var compiledMethod = dynamicMethod.CreateDelegate(..); compiledMethod(..); // ←    

    InvalidProgramException: "Common language runtime detected an invalid program".


  6.  var il = dynamicMethod.GetILGenerator(); {..} //  -  var toStringMethod = typeof(object).GetMethod("ToString"); il.Emit(OpCodes.Ldarga, 1); //  1 – int,    il.Emit(OpCodes.Callvirt, toStringMethod); //   int.ToString(),      //  value type     constrained {..} //  -  var compiledMethod = dynamicMethod.CreateDelegate(..); compiledMethod(..); // ←    

    NullReferenceException: "Object reference not set to instance of an object".
    Or
    AccessViolationException: “Attempted to read or write protected memory. This is often an indication that other memory is corrupt. ”


  7.  var il = dynamicMethod.GetILGenerator(); {..} //  -  var bindingFlags = BindingFlags.Instance | BindingFlags.NonPublic; //     value var valueField = typeof(KeyValuePair<int, string>).GetField("value", bindingFlags); il.Emit(OpCodes.Ldarga, 1); //  1 – KeyValuePair<string, int> il.Emit(OpCodes.Ldfld, valueField); //    value  KeyValuePair<string, int>,    // KeyValuePair<string, int>  KeyValuePair<int, string>,   //   key  int     string {..} //  -  var compiledMethod = dynamicMethod.CreateDelegate(..); var result = compiledMethod(..); // ←     {..} // -   result ←   

    Undefined behavior is most likely an AccessViolationException or NullReferenceException.

  8. Forgot at the end of the code call the OpCodes instruction. Ret - we get an undefined behavior: maybe an exception will be thrown when trying to compile, everything may just break already during the work, or it may get lucky and everything will work correctly.

  9. Implement the function
     static int Add(int x, double y) { return x + (int)y; } 

     var il = dynamicMethod.GetILGenerator(); il.Emit(OpCodes.Ldarg_0); //  0 -  int il.Emit(OpCodes.Ldarg_1); //  1 -  double il.Emit(OpCodes.Add); //   double  int.    il.Emit(OpCodes.Ret); var compiledMethod = dynamicMethod.CreateDelegate(..); var result = compiledMethod(..); // ←      

    The CIL specification states that the instruction is OpCodes . Add cannot accept arguments of types int and double , but there may be no exception, there will just be undefined behavior depending on the JIT compiler.

    Startup example:
    • x64: compiledMethod (10, 3.14) = 13
      ASM code (x lies in ecx , y - in xmm1 ):
      cvtsi2sd xmm0 , ecx
      addsd xmm0 , xmm1
      cvttsd2si eax , xmm0
    • x86: compiledMethod (10, 3.14) = 20
      ASM code (x is in ecx , y is on the stack):
      mov eax , ecx
      fld qword [ esp + 4]
      add eax , ecx
      fstp st (0)

    That is, under x64, the most logical interpretation was generated ( int is converted to double , then two doubles are added and the result is truncated to int ), but under x86, an attempt to mix integer and real operands results in 2 * x instead of x + y (readers I propose to see what happens if instead of int + double write double + int ).

  10. Implement the function
     static string Coalesce(string str) { return str ?? ""; } 

     var il = dynamicMethod.GetILGenerator(); il.Emit(OpCodes.Ldarg_0); // stack: [str] il.Emit(OpCodes.Dup); // stack: [str, str] var notNullLabel = il.DefineLabel(); il.Emit(OpCodes.Brtrue, notNullLabel); // if(str != null) goto notNull; stack: [str] il.Emit(OpCodes.Ldstr, ""); // Oops, ,       str il.MarkLabel(notNullLabel); //       :     ,   il.Emit(OpCodes.Ret); var compiledMethod = dynamicMethod.CreateDelegate(..); compiledMethod(..); // ←    

    InvalidProgramException: "JIT compiler encountered an internal limitation."

    A large number of similar errors fall into this : forgot to put this to invoke the instance method, forget to put the method argument, put the wrong value of the method argument, and so on.

If the text of the function consists of a dozen instructions, then you can somehow, having re-read the code several times, understand what the error is, but if the code consists of hundreds of commands, the development of such a code becomes a very dreary and lengthy exercise.
If, however, it is possible to force such code to compile, then it cannot be debugged. The only thing that can be done is to generate symbolic information in addition to the code, but it is long, inconvenient and difficult to keep up to date.

Therefore, having quite a lot of experience writing IL-code with the help of the ILGenerator and being exhausted by order, I decided to write my own, taking into account all the problems I had encountered.
The task was to write such an IL-generator so that an InvalidProgramException exception would never crash at all, but be picked up somewhere before with clear error text.

GroboIL


The result was GroboIL - a smart wrapper over the ILGenerator .

GroboIL Features:


The previous example rewritten using GroboIL :

 public static Action<T> BuildFieldsPrinter<T>() where T : class { var type = typeof(T); var method = new DynamicMethod(Guid.NewGuid().ToString(), //   typeof(void), //   new[] { type }, //   typeof(string), //     ,  , , string true); //      using(var il = new GroboIL(method)) { var fieldValue = il.DeclareLocal(typeof(object), "fieldValue"); var toStringMethod = typeof(object).GetMethod("ToString"); var fields = type.GetFields(BindingFlags.Instance | BindingFlags.Public | BindingFlags.NonPublic); foreach(var field in fields) { il.Ldstr(field.Name + ": {0}"); // stack: [format] il.Ldarg(0); // stack: [format, obj] il.Ldfld(field); // stack: [format, obj.field] if(field.FieldType.IsValueType) il.Box(field.FieldType); // stack: [format, (object)obj.field] il.Dup(); // stack: [format, obj.field, obj.field] il.Stloc(fieldValue); // fieldValue = obj.field; stack: [format, obj.field] var notNullLabel = il.DefineLabel("notNull"); il.Brtrue(notNullLabel); // if(obj.field != null) goto notNull; stack: [format] il.Ldstr("null"); // stack: [format, "null"] var printedLabel = il.DefineLabel("printed"); il.Br(printedLabel); // goto printed il.MarkLabel(notNullLabel); il.Ldloc(fieldValue); // stack: [format, obj.field] il.Call(toStringMethod); // stack: [format, obj.field.ToString()] il.MarkLabel(printedLabel); var writeLineMethod = typeof(Console).GetMethod("WriteLine", new[] { typeof(string), typeof(object) }); il.Call(writeLineMethod); // Console.WriteLine(format, obj.field.ToString()); stack: [] } il.Ret(); } return (Action<T>)method.CreateDelegate(typeof(Action<T>)); } 


Let's go over all the previous errors and see how it will look like with GroboIL .


  1.  using(var il = new GroboIL(dynamicMethod)) { {..} //  -  il.Ldfld(); // ←     {..} //  -  } 

    There will be a compilation error, as there is no overload of the GroboIL method. Ldfld () without parameters.


  2.  using(var il = new GroboIL(dynamicMethod)) { {..} //  -  il.Box(); // ←     {..} //  -  } 

    There will be a compilation error, as there is no overload of the GroboIL method. Box () without parameters.


  3.  using(var il = new GroboIL(dynamicMethod)) { {..} //  -  var code = GetCode(..); //   byte il.Ldc_I4(code); // ←   ,   int {..} //  -  } 

    GroboIL method. Ldc_I4 () accepts an int , so byte refers to an int and everything will be correct.


  4.  using(var il = new GroboIL(dynamicMethod)) { {..} //  -  il.Call(abstractMethod); // ←   ,    Callvirt {..} //  -  } 

    GroboIL function. Call () emits OpCodes . Call for non-virtual methods and OpCodes . Callvirt for virtual (if you need to call a virtual method non-virtual, for example, call the base implementation, you need to use the GroboIL method. Callnonvirt ())


  5.  using(var il = new GroboIL(dynamicMethod)) { {..} //  -  var keyGetter = typeof(KeyValuePair<int, int>).GetProperty("Key").GetGetMethod(); il.Ldarg(1); //  1 – KeyValuePair<int, int> il.Call(keyGetter); // ←    {..} //  -  } 

    The stack validator will generate an error that you cannot call a method on the value type:
    InvalidOperationException: "In order to call the method 'String KeyValuePair <Int32, String> .get_Value ()' on a value type 'KeyValuePair <Int32, String>' load an instance by ref or box it".


  6.  using(var il = new GroboIL(dynamicMethod)) { {..} //  -  var toStringMethod = typeof(object).GetMethod("ToString"); il.Ldarga(1); //  1 – int,    il.Call(toStringMethod); // ←    {..} //  -  } 

    The stack validator will generate an error that to call a virtual method on the value type, the 'constrained' parameter must be passed (which will substitute the OpCodes . Constrained prefix):
    InvalidOperationException: "In order to call a virtual method 'String Object.ToString ()' on a value type 'KeyValuePair <Int32, String>' specify the 'constrained' parameter".


  7.  using(var il = new GroboIL(dynamicMethod)) { {..} //  -  var bindingFlags = BindingFlags.Instance | BindingFlags.NonPublic; //     value var valueField = typeof(KeyValuePair<int, string>).GetField("value", bindingFlags); il.Ldarga(1); //  1 – KeyValuePair<string, int> il.Ldfld(valueField); // ←    {..} //  -  } 

    The stack validator will generate an error that cannot load the field:
    InvalidOperationException: "Cannot load the field 'KeyValuePair <Int32, String> .value' of an instance of type 'KeyValuePair <String, Int32>'".

  8. There is a check that any program ends with one of several valid instructions, in particular, on OpCodes . Ret .


  9.  using(var il = new GroboIL(dynamicMethod)) { il.Ldarg(0); //  0 -  int il.Ldarg(1); //  1 -  double il.Add(); // ←    il.Ret(); } 

    The stack validator will issue an error that the instruction is OpCodes . Add invalid in current context:
    InvalidOperationException: "Cannot perform the instruction 'add' on types 'Int32' and 'Double'".


  10.  using(var il = new GroboIL(dynamicMethod)) { il.Ldarg(0); // stack: [str] il.Dup(); // stack: [str, str] var notNullLabel = il.DefineLabel("notNull"); il.Brtrue(notNullLabel); // if(str != null) goto notNull; stack: [str] il.Ldstr(""); // Oops, ,       str il.MarkLabel(notNullLabel); // ←    il.Ret(); } 

    The stack validator will generate an error that the two ways of executing the code form a different calculation stack, and will show the contents of the stack in both cases:
    InvalidOperationException: “Inconsistent stack for the label 'notNull'
    Stack # 1: [null, String]
    Stack # 2: [String] »


Debugging


Among other things, GroboIL generates a debug text of the generated IL code, where the contents of the stack are written to the right of each instruction, which can be obtained by calling GroboIL . GetILCode (), for example:

 ldarg.0 // [List<T>] dup // [List<T>, List<T>] brtrue notNull_0 // [null] pop // [] ldc.i4.0 // [Int32] newarr T // [T[]] notNull_0: // [{Object: IList, IList<T>, IReadOnlyList<T>}] ldarg.1 // [{Object: IList, IList<T>, IReadOnlyList<T>}, Func<T, Int32>] call Int32 Enumerable.Sum<T>(IEnumerable<T>, Func<T, Int32>) // [Int32] ret // [] 


And finally, there is the opportunity to debug MethodBuillder 's. In this case, GroboIL automatically builds symbolic information, where the source text is the above debug text.

Example:

 public abstract class Bazzze { public abstract int Sum(int x, double y); } public void Test() { var assembly = AppDomain.CurrentDomain.DefineDynamicAssembly( new AssemblyName("DynAssembly"), AssemblyBuilderAccess.RunAndCollect); // ,    Assembly,      var module = assembly.DefineDynamicModule("zzz", "zzz.dll", true); // true -     var symWriter = module.GetSymWriter(); var typeBuilder = module.DefineType("Zzz", TypeAttributes.Public | TypeAttributes.Class, typeof(Bazzze)); var method = typeBuilder.DefineMethod( "Sum", MethodAttributes.Public | MethodAttributes.Virtual, //      typeof(int), //   new[] { typeof(int), typeof(double) }); //   method.DefineParameter(1, ParameterAttributes.None, "x"); //     method.DefineParameter(2, ParameterAttributes.None, "y"); //      watch var documentName = typeBuilder.Name + "." + method.Name + ".cil"; var documentWriter = symWriter.DefineDocument(documentName, SymDocumentType.Text, SymLanguageType.ILAssembly, Guid.Empty); //      using(var il = new GroboIL(method, documentWriter)) //    documentWriter { il.Ldarg(1); // stack: [x] il.Ldarg(2); // stack: [x, y] il.Conv<int>(); // stack: [x, (int)y] il.Dup(); // stack: [x, (int)y, (int)y] var temp = il.DeclareLocal(typeof(int), "temp"); il.Stloc(temp); // temp = (int)y; stack: [x, (int)y] il.Add(); // stack: [x + (int)y] il.Ret(); File.WriteAllText(Path.Combine(DebugOutputDirectory, documentName), il.GetILCode()); } typeBuilder.DefineMethodOverride(method, typeof(Bazzze).GetMethod("Sum")); //   var type = typeBuilder.CreateType(); var inst = (Bazzze)Activator.CreateInstance(type, new object[0]); inst.Sum(10, 3.14); } 


Now we set breakpoint on the line inst.Sum (10, 3.14); and press F11 (step into), the dialog box will appear:



In the window that opens, select the folder where the debag file was placed, and see something like this:



This Visual Studio file is perceived as a normal source, you can debug through F10 / F11, set breakpoints, you can enter the parameters of the function, this , local variables in watch.

Unfortunately, DynamicMethod's debugging is just as awesome, because they don’t have a built-in mechanism for constructing symbolic information (if someone from the readers knows this method, I would be happy to hear). But, since the IL-commands are the same for both DynamicMethod 'and MethodBuilder ', you can design the code so that it will be easy to replace DynamicMethod with MethodBuilder for debug and disable it in the release version.

Conclusion


From the height of my five-year experience in generating IL-code, I can draw the following conclusion: the difference in the development of the ILGenerator and GroboIL code is comparable to the difference in C # development in VisualStudio with a resampler and notebook development with a compiler, which says the answer is Accepted / Rejected without a number lines with an error. The difference in development speed is an order of magnitude. In my opinion, GroboIL allows you to generate IL-code with almost the same speed as generate, for example, C # -code, while retaining all the advantages of a low-level language.

Source: https://habr.com/ru/post/262711/


All Articles