You can generate .Net code in several ways:
- Reflection Emit. Available from .Net 1.0.
- CodeDom. Allows you to create dynamic code from the CodeDom view or directly from source code written in one of the high-level languages, such as C #, VB or JScript. Available from .Net 1.0.
- Expression trees. Available from .Net 3.5. Allows you to create dynamic code from the Expression view.
In this article I want to tell you about the code generation technique using Reflection Emit.
A little more about how to generate
The first is the direct generation of CIL code (also known as MSIL or simply IL) for the .Net virtual machine. In this case, the generated code is described in CIL, which in appearance resembles an assembler on steroids. At the output, you get a dynamic assembly (with the ability to save it to disk) with dynamic classes and methods or a “bare” dynamic method. Then use the generated good at your discretion.
The second is the generation of source code in a high-level language (for example, C # or VB), and then the subsequent compilation of source codes into CIL. At the output, you get an assembly created by the corresponding compiler.
The third is generation from the Expression Tree view. You describe some ASD (AST) with the help of Expression methods, and then with the same Expression you generate the described method. Inside, Expression translates its presentation immediately into CIL code, while producing some useful validation of the described SDA (AST). But Expression Tree is limited in its capabilities - you cannot generate your types and assemblies and, accordingly, save them to disk.
We will generate CIL
Why on CIL, but not in high level language? Generation on CIL is more efficient, because generation in a high-level language is the creation of source codes in this language, and then compiling them into CIL. In addition, generation in a top-level language requires the involvement of an external process — the compiler. And yet - this is a rare opportunity to dig out with something like an assembler for a .net programmer. But the generation in the high-level language has its advantages: it is not necessary to deal with CIL, you generate the code in the familiar language. In addition, the source code of such code generation can always be saved or dumped into the log, and then validated by eye or even inserted into the IDE and debugged.
What is it like
In order to generate code using Reflection Emit, you need to have a minimal understanding of the assembler. In CIL assembler there are no registers, offsets and tricky addressing. And what is there? There is a stack of calculations, all operations work only with it, there are no registers. In this case, the computation stack is so named not without purpose, it does not include local variables and method arguments — for CIL, these are separate concepts. There are still operations. They are of two types: ordinary assembler (various kinds of transitions, mathematical operations, method calls, etc. or CLR special (Box / Unbox, Newobj, Isinst, etc.). However, the separation is purely formal.
')
Enough words, start generating
It is better to see once than hear a hundred times, and even better - podebazhit. This is me to the fact that enough words, let's look at an example.
Let the task be this: generate a converter from one entity to another. Those. There are classes that are essentially the same, but with different property names. For example:
public class TestSrc { public int SomeID { get; set; } } public class TestTarg { public double SomeOtherID { get; set; } }
We need to convert TestSrc to TestTarg. Let our converter look like this:
class Mapper<TIn, TOut> { protected delegate TOut MapMethod(TIn src); public TOut Map(TIn source) {...} private MapMethod GenerateMapMethod(IDictionary<string, string> mapping) {...} }
The Map method at the first call generates a transforming method, calling GenerateMapMethod, and at subsequent calls it uses the already generated method. The mapping that we pass to the input. GenerateMapMethod is the matching of fields in the entities (Key is the name of the property in the TIn type. Value is the name of the property in the TOut type).
Dynamic build
First we need to make a choice: where will our generated code be placed? There are two options - in a dynamic assembly or in a dynamic method. Both are created on the fly.
Dynamic assembly is a full-fledged solution; it allows you to generate real classes and structures with any set of methods. Another plus of a dynamic assembly is the ability to save it for later use or to analyze what you generated there. This is for difficult cases.
So, we create an assembly with a class and a static method:
protected MapMethod GenerateMapMethod(IDictionary<string, string> mapping) { var dynGeneratorHostAssembly = AppDomain.CurrentDomain.DefineDynamicAssembly( new AssemblyName("Test.Gen, Version=1.0.0.1"), AssemblyBuilderAccess.RunAndSave); var dynModule = dynGeneratorHostAssembly.DefineDynamicModule( "Test.Gen.Mod", "generated.dll"); var dynType = dynModule.DefineType( "Test.MapperOne", TypeAttributes.Abstract | TypeAttributes.Sealed | TypeAttributes.Public); var dynMethod = dynType.DefineMethod( "callme", MethodAttributes.Static, typeof(TOut), new Type[] { typeof(TIn) }); var prm = dynMethod.DefineParameter(1, ParameterAttributes.None, "source"); GenerateMapMethodBody(dynMethod.GetILGenerator(), prm, mapping); var finalType = dynType.CreateType(); dynGeneratorHostAssembly.Save("generatedasm.dll"); var realMethodInfo = finalType.GetMethod(dynMethod.Name); var methodToken = dynMethod.GetToken().Token; var methodInfo = dynModule.ResolveMethod(methodToken); return (MapMethod)Delegate.CreateDelegate( typeof(MapMethod), (MethodInfo)methodInfo); }
What does this code do? Yes, in general, what is written, then it does - determines the dynamic assembly in the current domain (you can create a separate one) and indicates how we will use the assembly: just run, only save or all (defined by the AssemblyBuilderAccess enumeration). It is not known whether a significant overhead projector will be known if you specify AssemblyBuilderAccess.RunAndSave and you do not need to save the assembly. In .Net 4, it is possible to make unloading dynamic assemblies (AssemblyBuilderAccess.RunAndCollect). In order for the assembly to be unloaded, no one should refer to instances of the types of this assembly and the types themselves, see
here for more details.
Next, we define the module in the assembly. We remember that assemblies consist of modules, most often one assembly - one module, but there may be multi-module assemblies. A module corresponds to a physical file, so when defining a module, we specify the file name for it.
In the module we define the type - it can be a class or a structure. A simple call to DefineType ("Test.MapperOne") will create the private class MapperOne in the namespace Test. Despite the fact that you may not have to refer to the generated classes and methods by name, it is better to give them accurate names and namespaces, because, firstly, they appear on the stack of traces, and secondly, if you analyze the generated structure of the reflector will be clearer and more pleasant. “Stop!”, The attentive reader will say. After all, we get a private class, and even in another assembly, can we turn to it? Well, in fact, we can. But if you want everything to be strictly correct, write this:
var dynType = dynModule.DefineType("Test.MapperOne", TypeAttributes.Abstract | TypeAttributes.Sealed | TypeAttributes.Public);
Finally, in the type we define our method by specifying the type of the value returned by it and the types of the input arguments of the method.
Next, we need to fill our generated method with meaning. We will discuss this process in detail later, so we skip the GenerateMapMethodBody call (dynMethod.GetILGenerator (), prm, mapping) and look further.
After all the methods in the type are generated, we have to create the type by calling the dynType.CreateType () method. After that, no dynamic manipulations with the type become impossible. But our type is now ready for use. Before calling CreateType, the CLR knows nothing about our type and the methods in it. Unlike the assembly that appears in the domain immediately after the DefineDynamicAssembly call, and the module that appears in the assembly immediately after the DefineDynamicModule call.
One interesting point: when we defined the dynamic type with the DefineType method, it returned to us TypeBuilder. If you look at TypeBuilder, it is inherited from Type, but not all Type methods can be addressed if your Type is a TypeBuilder. If you think about it, this is logical, because there is no such thing as such. Some Type properties are redefined so that they always return NotSupportedException. Some methods throw an exception before calling CreateType, and then begin to redirect calls to the appropriate RuntimeType. The situation is similar with the MethodBuilder class, which is inherited from MethodInfo. Not all properties and methods are also implemented in MethodBuilder. Another situation is complicated by the fact that, for example, Delegate.CreateDelegate accepts MethodInfo as the second argument, but if you try to pass MethodBuilder there, you will get an exception in reply (even after calling CreateType). So be careful.
Dynamic method
But let's say you don't need an assembly, you don't need your own types, you just want to generate a small method. Then you are more suitable "economy" proposal - a dynamic method. Instead of a heap of code from the previous section, we write the following:
protected MapMethod GenerateMapMethod2(IDictionary<string, string> mapping) { var dynMethod = new DynamicMethod("callme", typeof(TOut), new Type[] { typeof(TIn) }); var prm = dynMethod.DefineParameter(1, ParameterAttributes.None, "source"); GenerateMapMethodBody(dynMethod.GetILGenerator(), prm, mapping); return (MapMethod)dynMethod.CreateDelegate(typeof(MapMethod)); }
They created a method, filled it with meaning and returned a delegate for it. Is done. Although the documentation states that this method does not need a dynamic assembly, module and type, if you use reflection or ProcessExplorer, you will see that the dynamic assembly is still created (one for all dynamic domain methods). And even there is a manifest module in it, but I could not find (by reflection) in it our method. Nevertheless, it is and works. The dynamic method and all the memory allocated for its generation can be freed after no one has referred to it. Therefore, this method will be even a little faster and more economical. In this case, we use anonymous hosting (anonymously hosted) for the method, but there is an option to “stick” our method to an already existing module or even a class. To do this, there are special constructors that accept a module or type, to which we kind of add a dynamic method. In the case of a module, the type becomes global for the module and has access to all types of module, including internal. In the case of a class, we will still have access to all the internal fields of the class. But even if you use “anonymous” hosting, you still have the opportunity to access internal classes and even internal fields of these classes from a dynamic method. To do this, use the constructor with the skipVisibility parameter and set this parameter to true (this parameter indicates that JIT verification is skipped, not to be confused with CAS verification). By the way, the ability to use “anonymous” hosting appeared only in .Net 3.0.
Body method
And here we come to the most interesting part - how to generate the code? In our example, the code generates the GenerateMapMethodBody method (dynMethod.GetILGenerator (), prm, mapping). In this method, we pass the ILGenerator, the parameter and the mapping - the correspondence of the fields of one class to another. The ILGenerator class allows you to insert CIL commands into the body of the generated method. He does this with the Emit method. ILGenerator also allows you to make labels for transitions using the DefineLabel method (for organizing conditional structures), declare local variables using the DefineLocal method, and make blocks for exceptions. For the latter, a whole set of methods of the form BeginCatchBlock, BeginExceptFilterBlock, etc. is used. Most of the commands in CIL work with the calculation stack (evaluation stack, then for brevity, just the stack). The CLR ensures that you do not go beyond the stack, either one way or the other. If the stack overflows, you will get a StackOverflowException, if you try to take a value from an empty stack or a value that your method did not put there (that is, the method sees only its part of the stack), you will get an InvalidProgramException. Arguments passed to your method are not on the stack; to use them you need to use the OpCodes.Ldarg command. Thus, at the beginning of the method, the stack is empty. It must also be empty after the method is executed, otherwise InvalidProgramException will again be. And this is one of the drawbacks of code generation in CIL: those errors that you could catch at the compilation stage of a high-level language here you only get at run time, for example, errors associated with typing or initialization of variables.
A convenient technique for generating IL code is to write in a high-level language, an example of what you want to generate, compile it (do not forget to switch to Release before compiling to take an example from the optimal code) and see what the pattern of the necessary code looks like in CIL. It is convenient to look at such template code with a reflector. Moreover, there is even a special plugin ReflectionEmitLanguage. This plugin does not show the code of the method or type being viewed in the reflector, but the code that generates the code being viewed. If there was no reflector on hand, you can view the template using IL Disassembler (ildasm.exe) from the .Net SDK. It will show the honest CIL that your method consists of. Next, we adapt the template to your needs, and everything is ready. Using the same method, you can find out which modifiers you need to add to a method or its class, for example, to make a sealed class or an internal (internal) virtual method.
Suppose we know from somewhere the correspondence of properties between classes, then the template will look like this:
public static TestTarg GenerateTemplate(TestSrc src) { var result = new TestTarg(); result.SomeOtherID = (double)src.SomeID; return result; }
We compile the code, look at it in IL Disassembler and see:
.method private hidebysig static class ConsoleApplication1.TestTarg GenerateTemplate(class ConsoleApplication1.TestSrc src) cil managed {
Looking at this template code, we transfer each operation as a call to ILGenerator.Emit (), for example:
If you use the reflector with the Reflection.Emit plugin, then everything will become even simpler, it will show you exactly what ILGenerator.Emit () calls you need.
Here is what the plugin will show for our template. public MethodBuilder BuildMethodGenerateTemplate(TypeBuilder type) {
Help for each operation can be viewed in msdn, the OpCodes class contains definitions for all operations. Some teams are better not to carry "in the forehead." For example, such commands as stloc.0, so as not to get confused, it is better to write wrong:
gen.DeclareLocal(yourType); gen.Emit(OpCodes.Ldloc_0);
and so:
var locResult = gen.DeclareLocal(yourType); gen.Emit(OpCodes.Ldloc, localVar);
Similarly, you can do with the parameters of methods.
Notice that some constructs that, say, in C # look the same, will differ in CIL. For example:
var c = new RefType();
I would also like to draw attention to the ref parameters. Since in the case of ref parameters, the parameter is not a value, but a link, then it is necessary to work on the CIL level with it differently. This is a kind of indirect addressing. If this is a ref parameter of reference type, then the parameter will contain a link to the link, and a simple ldarg command will put on the stack not the link to the object, but that link to the link (a great chance to get lost in two pines). To get an object reference on the stack, you must additionally call ldind.ref.
If this is a ref parameter of type value (but not a structure), then the parameter will contain a reference to the value. And to put a value on the stack, you need to use not the ldarg or starg commands, but ldind or stind.
A similar situation with structures (exactly the opposite). If you have a parameter or a variable of the structure type, then to access it, you must first put an address onto the structure in the stack. For this there is a command ldarga.
Conclusion
Riddle: what does regexp and CIL generation have in common? Answer: extremely complex reverse engineering. Therefore, do not be lazy to weigh the generating code with comments so that it is clear what you are generating. Well, or, say, in the generating methods comments should be 80% more than usual. If you usually do not write them at all, then it's time to start.
Perhaps there are still many questions that can be discussed on the subject of CIL generation, but it seems to me that the article has already dragged out. So good luck to all and see you soon.
Useful links:
UPD:Smart people in the comments suggest that they can also help you in the difficult task of generating:I have not worked with them and can not say anything about them.PS
Each programmer must write a compiler, code generator and PeHaPe shop.(Another option is not the topic, but I liked: Each programmer must build a Linux kernel, grow the database to a terabyte and put in a floating bug).