Here is the second article in the series on IL2CPP. This time we will talk about the C ++ code generated by the il2cpp.exe utility, and also consider the presentation of managed types in machine code, runtime checks that are used to support the .NET virtual machine, the generation of loops, and much more.

To do this, we will use a very specific code that will surely change in future versions of Unity. But the basic principles will remain unchanged.
Sample project')
For this example, I will use the latest available version of Unity 5.0.1p1. As in the previous article, create a new empty project and add one script with the following content:
using UnityEngine; public class HelloWorld : MonoBehaviour { private class Important { public static int ClassIdentifier = 42; public int InstanceIdentifier; } void Start () { Debug.Log("Hello, IL2CPP!"); Debug.LogFormat("Static field: {0}", Important.ClassIdentifier); var importantData = new [] { new Important { InstanceIdentifier = 0 }, new Important { InstanceIdentifier = 1 } }; Debug.LogFormat("First value: {0}", importantData[0].InstanceIdentifier); Debug.LogFormat("Second value: {0}", importantData[1].InstanceIdentifier); try { throw new InvalidOperationException("Don't panic"); } catch (InvalidOperationException e) { Debug.Log(e.Message); } for (var i = 0; i < 3; ++i) { Debug.LogFormat("Loop iteration: {0}", i); } } }
I will build this project under WebGL using the Unity editor on Windows. To get relatively good names in the generated C ++ code, I enabled the Development Player option in Build Settings. In addition, I set the Full value to Enable Exceptions in the WebGL Player Settings.
Overview of the generated codeAfter the build is completed, the generated C ++ code can be found in the Temp \ StagingArea \ Data \ il2cppOutput directory in the project folder. As soon as I close the editor, this directory will be deleted, but as long as it is open, you can carefully examine it.
The il2cpp.exe utility generated many files even for such a small project: 4625 header files and 89 C ++ source code files. To check this amount of code, I prefer to use a text editor with support for
Exuberant CTags. Typically, CTags quickly generates a tag file, which greatly simplifies navigation through the code.
You may notice that many of the generated C ++ files contain not simple code from our script, but converted code from standard libraries, such as mscorlib.dll. As mentioned in the previous article, the IL2CPP script engine uses the same standard code for libraries as Mono. Please note that we convert the code of mscorlib.dll and other standard libraries each time you run il2cpp.exe. This may seem unnecessary, since the code does not change.
The fact is that IL2CPP always clears bytecode to reduce the size of the executable file. Therefore, even small changes in the script code can lead to the fact that different parts of the standard library code will be used or not, depending on the circumstances. Therefore, mscorlib.dll must be converted with each build. We are trying to improve the incremental build process, but so far without much success.
Mapping managed code in generated C ++ codeFor each type in the managed code, il2cpp.exe generates 2 header files: to determine the type and declare the methods for that type. For example, let's look at the contents of the converted UnityEngine.Vector3 type. The header file for this type is called UnityEngine_UnityEngine_Vector3.h. The name is created based on the assembly name (UnityEngine.dll), namespace, and type name. The code looks like this:
The il2cpp.exe utility converts each of the three fields of the instance and slightly changes the names, using the initial underscore to avoid possible conflicts with reserved words. We use reserved names in C ++, but have never seen them in conflict with the code of standard libraries.
The UnityEngine_UnityEngine_Vector3MethodDeclarations.h file contains declarations for all methods in Vector3. For example, Vector3 overrides the Object.ToString method:
Notice the comment, which specifies the managed method that represents the original declaration. This can be useful for finding output files by the name of a managed method in this format, especially for methods with common names such as ToString.
The methods converted by il2cpp.exe have several interesting features:
• They are not member functions in C ++, but are free functions with this as the first argument. For the first argument of static functions in managed code, IL2CPP always passes NULL. By declaring methods with the this pointer as the first argument, we simplify the code generation in il2cpp.exe and invoking methods through other methods (for example, delegates) for the generated code.
• Each method has an additional argument of type MethodInfo * containing metadata about the method that can be used, for example, to call a virtual method. Mono uses platform-specific transports to pass this metadata. But in the case of the IL2CPP, we decided not to use them to improve portability.
• All methods are declared through extern "C", so that il2cpp.exe can deceive the C ++ compiler if necessary and treat all methods as if they were of the same type.
• Type names contain the “_t” suffix, method names - the “_m” suffix. Name conflicts are resolved by adding a unique number for each name. In case of any changes in the code of the user script, these numbers also change, so you should not count on them when switching to a new assembly.
The first 2 points imply that each method has at least 2 parameters: the this pointer and the MethodInfo pointer. Do these parameters add extra resource costs? Yes, they add, but this does not affect the performance, as it may seem at first glance. At least that is what profiling results say.
Let us proceed to the definition of the ToString method using Ctags. It is in the Bulk_UnityEngine_0.cpp file. The code in this method definition is not similar to C # code in the Vector3 :: ToString () method. However, if you use a tool like ILSpy to view the code for the Vector3 :: ToString () method, you may notice that the generated C ++ code is very similar to the IL code.
Why does il2cpp.exe not generate a separate C ++ file for defining methods of each type, as it does for declaring methods? The Bulk_UnityEngine_0.cpp file is quite large - 20,481 lines! The used C ++ compilers hardly coped with a large number of source files. Compiling 4 thousand .cpp files took longer than compiling the same source code in 80 .cpp files. Therefore, il2cpp.exe divides the method definitions for types into groups and generates one C ++ file for each of them.
Now back to the header file method declarations and pay attention to the line at the top of the file:
#include "codegen/il2cpp-codegen.h"
The il2cpp-codegen.h file contains the interface through which the generated code accesses the libil2cpp environment. Later we will discuss several ways to use this environment.
Method prologueLet's look at the definition of the Vector3 :: ToString () method, namely the general prologue created by il2cpp.exe for all methods.
StackTraceSentry _stackTraceSentry(&Vector3_ToString_m2315_MethodInfo); static bool Vector3_ToString_m2315_init; if (!Vector3_ToString_m2315_init) { ObjectU5BU5D_t4_il2cpp_TypeInfo_var = il2cpp_codegen_class_from_type(&ObjectU5BU5D_t4_0_0_0); Vector3_ToString_m2315_init = true; }
In the first line of the prologue, a local variable of type StackTraceSentry is created. It is used to track the managed call stack, for example, using Environment.StackTrace. In fact, the generation of this code is optional, in which case it started due to the il2cpp.exe passing the argument --enable-stacktrace (since I set the Full value to Enable Exceptions in WebGL Player Settings). We found that for small functions, this variable increases resource costs and negatively affects performance. Therefore, we never add this code for iOS and other platforms where you can get stack trace information without it. The WebGL platform does not support stack tracing, so you need to allow managed code exceptions to work correctly.
The second part of the prologue runs a “lazy” initialization of the metadata type for any array or generic types used in the method body. Thus, ObjectU5BU5D_t4 is the name of the type System.Object []. This part of the prologue is executed only once and does not do anything if the type has already been initialized, so no negative effect on performance was noticed.
What about streaming security? What if two threads call Vector3 :: ToString () at the same time? It's okay: all the code in the libil2cpp environment used to initialize the type is safe to call from multiple threads. Most likely, the il2cpp_codegen_class_from_type function will be called several times, but in fact it will work only once, in one thread. Method execution will not resume until initialization completes. Therefore, this method prolog is thread safe.
Runtime ChecksThe next part of the method creates an array of objects, saves the value of the X field for Vector3 into a local variable, then packages this variable and adds it to the array with a zero index. The generated C ++ code (with comments) looks like this:
Il2cpp.exe adds 3 checks that are missing from the IL code:
• If the array value is NULL, the NullCheck check throws a NullReferenceException exception.
• If the array index is incorrect, the IL2CPP_ARRAY_BOUNDS_CHECK check throws an IndexOutOfRangeException exception.
• If the type of the element added to the array is incorrect, ArrayElementTypeCheck throws an ArrayTypeMismatchException exception.
These run-time checks ensure that the data for the .NET virtual machine is correct. Instead of embedding code, Mono uses target platform mechanisms to handle the same checks. In the case of IL2CPP, we wanted to cover as many platforms as possible, including such as WebGL, which do not have their own verification mechanism. Therefore, the il2cpp.exe utility injects these checks itself.
Do these tests create performance problems? In most cases, no problems were noticed. Moreover, checks provide additional benefits and security for the .NET virtual machine. In some individual cases, we still recorded a decrease in performance, especially in continuous cycles. Now we are trying to find a way to allow managed code to remove dynamic checks when il2cpp.exe generates C ++ code. Keep for updates.
Static fieldsNow that we have seen how the fields of the instance look like (using the example of Vector3), let's see how static fields are transformed and how they are accessed. First we find the definition of the HelloWorld_Start_m3 method, which is in the Bulk_Assembly-CSharp_0.cpp file in my assembly, and then go to the type Important_t1 (in the AssemblyU2DCSharp_HelloWorld_Important.h file):
struct Important_t1 : public Object_t {
Note that il2cpp.exe created a separate C ++ structure to provide a static field that is accessible to all instances of this type. Thus, at run time, one instance of type Important_t1_StaticFields will be created, and all instances of type Important_t1 will use it as a static field. In the generated code, access to a static field is as follows:
int32_t L_1 = (((Important_t1_StaticFields*)InitializedTypeInfo(&Important_t1_il2cpp_TypeInfo)->static_fields)->___ClassIdentifier_0);
The type metadata for Important_t1 contains a pointer to a single instance of type Important_t1_StaticFields, as well as the information that this instance is used to get the value of a static field.
ExceptionsIl2cpp.exe converts managed exceptions to C ++ exceptions. We chose this approach so that, again, not to depend on specific platforms. When il2cpp.exe needs to generate code to create a managed exception, it calls the il2cpp_codegen_raise_exception function. The call and interception code for managed exceptions in our HelloWorld_Start_m3 method looks like this:
try {
All managed exceptions are wrapped in a type of Il2CppExceptionWrapper. When the generated code intercepts an exception of this type, it unpacks its C ++ representation (which is of type Exception_t8). In this case, we are only looking for InvalidOperationException, so if we don’t find an exception of this type, C ++ will throw a copy of it again. If we find an exception of this type, the code will start the interception handler and display an exception message.
Goto ?!An interesting question arises: what do goto labels and operators do here? These constructs need not be used in structured programming. The fact is that the IL language does not use the principles of structured programming, such as cycles and conditional statements. This is a low-level language, so il2cpp.exe follows low-level concepts in the generated code.
As an example, consider the for loop in the HelloWorld_Start_m3 method:
IL_00a8: { V_2 = 0; goto IL_00cc; } IL_00af: { ObjectU5BU5D_t4* L_19 = ((ObjectU5BU5D_t4*)SZArrayNew(ObjectU5BU5D_t4_il2cpp_TypeInfo_var, 1)); int32_t L_20 = V_2; Object_t * L_21 = Box(InitializedTypeInfo(&Int32_t5_il2cpp_TypeInfo), &L_20); NullCheck(L_19); IL2CPP_ARRAY_BOUNDS_CHECK(L_19, 0); ArrayElementTypeCheck (L_19, L_21); *((Object_t **)(Object_t **)SZArrayLdElema(L_19, 0)) = (Object_t *)L_21; Debug_LogFormat_m7(NULL , (String_t*) &_stringLiteral6, L_19, &Debug_LogFormat_m7_MethodInfo); V_2 = ((int32_t)(V_2+1)); } IL_00cc: { if ((((int32_t)V_2) < ((int32_t)3))) { goto IL_00af; } }
The variable V_2 is the loop index. At the beginning it has a value of 0, then it increases at the bottom of the loop in this line:
V_2 = ((int32_t)(V_2+1));
The end of cycle condition is checked here:
if ((((int32_t)V_2) < ((int32_t)3)))
While V_2 is less than three, the goto statement goes to the IL_00af label, which is the upper part of the loop body. As you might have guessed, at the moment il2cpp.exe generates C ++ code directly from IL without using the intermediate abstract representation of the syntax tree. You may also have noticed that in the "Runtime Checks" section there are such fragments in the code:
float L_1 = (__this->___x_1); float L_2 = L_1;
Obviously, the variable L_2 is superfluous here. Despite the fact that it is eliminated in most C ++ compilers, we would like to avoid its appearance in the code at all. We are now considering the possibility of using an abstract syntax tree to better understand IL code and generate the best C ++ code for cases when local variables and loops are used.
ConclusionWe have touched on only a small part of the C ++ code generated by IL2CPP for a very simple project. Now I recommend that you take a look at the generated code of your own project. Keep in mind that in future versions of Unity, C ++ code will look different as we continue to improve the quality and performance of the IL2CPP technology.
By converting the IL code to C ++, we managed to achieve a good balance between its portability and performance. We have received many useful functions for developers of managed code, while retaining the advantages of machine code, which the C ++ compiler provides for various platforms.
In future posts, we will talk about the generated code in more detail: consider the method calls and the distribution of their implementations and wrappers for calling native libraries. And next time we will debug the generated code for the 64-bit version of iOS using Xcode.