IL2CPP: generalized implementation

In the previous article in the IL2CPP series, we looked at method calls in the generated C ++ code. Now we will talk about one of the most important features of the IL2CPP code - a generalized implementation of methods that allows to significantly reduce the size of the executable file IL2CPP. It is worth noting that the generic implementation is also used in Mono and .NET runtime environments. In IL2CPP, it was not initially supported and was added only with time.

So, we will analyze the implementation of generalized methods for reference types and value types, as well as how the limitations of generalized parameters affect it. Do not forget that the generated code in this article may change in future versions of Unity. But, as a rule, we discuss such changes immediately after their release.

What is a generic implementation?

Imagine that you need to write an implementation for the List class in C #. Will this implementation depend on type T? Can I use the implementation of the Add method for both List string and List object? What about List DateTime?
')
Generally, plus generalizations is that their implementations in C # are suitable for sharing, which means that the generic List class is suitable for any type of T. But what if you need to convert a List from C # to something executable, for example, an assembly code how does Mono do it, or C ++ code in the case of IL2CPP? Can we then use the same implementation of the Add method?

In most cases, yes. As we will see later in this article, the possibility of a generalized implementation is almost entirely the size of type T. If it is a reference type (string or object), its size is always equal to the size of the pointer. If T is a value type (int or DateTime), its size can vary, and this complicates things a little. In the end, the more methods have a common implementation, the smaller the size of the executable code.

Mark Probst, a developer who implemented a generic implementation in Mono, wrote about several interesting articles about this. We will not go deep into the concept itself, but rather talk about how and under what conditions it is used in the IL2CPP. I think this information will be able to give you a more complete picture of the size of the executable file of your project.

Features generalized implementation in IL2CPP

IL2CPP supports a generic method implementation for the SomeGenericType type if T is a reference type (string, object, or any custom class), integer type, or enum. For value types, the generic implementation is not supported, since their size may vary depending on the size of the fields.

This means that adding SomeGenericType, where T is the reference type, will have little effect on the size of the executable file. On the other hand, if T is a type of value, the consequences will be more tangible. In Mono and IL2CPP it works the same way. But let's go directly to the implementation details.

Preparation for work

I will use the version of Unity 5.0.2p1 on Windows to build the project under WebGL. At the same time, I will turn on the Development Player option and set the value to None for Enable Exceptions. To begin with, we write the driver method to create instances of generic types, which we will consider:

public void DemonstrateGenericSharing() { var usesAString = new GenericType<string>(); var usesAClass = new GenericType<AnyClass>(); var usesAValueType = new GenericType<DateTime>(); var interfaceConstrainedType = new InterfaceConstrainedGenericType<ExperimentWithInterface>(); }

Then we define the types used in this method:

 class GenericType<T> { public T UsesGenericParameter(T value) { return value; } public void DoesNotUseGenericParameter() {} public U UsesDifferentGenericParameter<U>(U value) { return value; } } class AnyClass {} interface AnswerFinderInterface { int ComputeAnswer(); } class ExperimentWithInterface : AnswerFinderInterface { public int ComputeAnswer() { return 42; } } class InterfaceConstrainedGenericType<T> where T : AnswerFinderInterface { public int FindTheAnswer(T experiment) { return experiment.ComputeAnswer(); } }

All code is nested in a class called HelloWorld, derived from MonoBehaviour. You may also note that the il2cpp.exe command line no longer contains the -enable-generic-sharing option, as in the first article in this series . However, a generic implementation occurs, but now - automatically.

Generic implementation of reference types

To begin, consider the most common case - reference types. In managed code, these types are derived from System.Object, and in generated code, from Object_t. Therefore, you can use Object_t * placeholder to represent them in C ++ code.

Let's find the generated version of the DemonstrateGenericSharing method. In my project, it is called HelloWorld_DemonstrateGenericSharing_m4. We are interested in defining four methods in the GenericType class. With Ctags, we can go to the method declaration for GenericType_1__ctor_m8 (constructor GenericType). Notice that this method declaration is a #define statement that maps this method to the GenericType_1__ctor_m10447_gshared method.

Now let's find the method declarations for the GenericType type. Interestingly, the declaration of the constructor GenericType_1__ctor_m9 is also the #define operator associated with the same function - GenericType_1__ctor_m10447_gshared!
A comment to the definition code GenericType_1__ctor_m10447_gshared indicates that this method corresponds to the name of the managed method HelloWorld / GenericType`1 <System.Object> ::. Ctor (). This is a constructor of type GenericType object, which is called completely generalized - if you take the type GenericType, then for any reference type T the implementation of all methods will use the version where T is object.

Just below the constructor in the generated code, you can see the UsesGenericParameter method:

 extern "C" Object_t * GenericType_1_UsesGenericParameter_m10449_gshared (GenericType_1_t2159 * __this, Object_t * ___value, MethodInfo* method) { { Object_t * L_0 = ___value; return L_0; } }

In both cases, where the generic parameter T is found (the type of the return value and the type of the individual argument), the type of Object_t * is used in the generated code. And taking into account the fact that all reference types in such code can be represented via Object_t *, this implementation of the method can be called for any T that is a reference type.

In the second article in this series, I mentioned that all method definitions in C ++ are free functions. The il2cpp.exe utility does not use C ++ inheritance to generate overridden C # methods, but uses it for types. By typing “AnyClass_t” into the search, we can see how the C # AnyClass type looks like in C ++:

 struct AnyClass_t1 : public Object_t { };

Given that AnyClass_t1 is derived from Object_t, we can simply pass it a pointer as an argument to the GenericType_1_UsesGenericParameter_m10449_gshared function.

But what about the return value? We cannot return a pointer to the base class where a pointer to the derived class is supposed, isn't it? Take a look at the declaration of the GenericType :: UsesGenericParameter method:

 #define GenericType_1_UsesGenericParameter_m10452(__this, ___value, method) (( AnyClass_t1 * (*) (GenericType_1_t6 *, AnyClass_t1 *, MethodInfo*))GenericType_1_UsesGenericParameter_m10449_gshared)(__this, ___value, method)

In the generated code, the return value (type Object_t *) actually becomes derived type AnyClass_t1 *. It turns out that IL2CPP is deceiving the C ++ compiler to avoid the C ++ type system.

Generalized implementation with constraints

Suppose we need to allow some methods to be called on an object of type T, but wouldn't the use of Object_t * prevent this? It will, but first we need to communicate this idea to the C # compiler with the help of generalized constraints.

Take another look at the script code, namely InterfaceConstrainedGenericType. This generic type uses a where clause to type T derived from the AnswerFinderInterface interface, thereby allowing a call to the ComputeAnswer method. In the previous article, we said that calling the interface methods requires searching in the vtable table. And since the FindTheAnswer method makes a direct function call for an instance of a limited type T (represented by Object_t *), a fully generalized implementation can be used in C ++ code.

Moving from implementing the HelloWorld_DemonstrateGenericSharing_m4 function to defining the InterfaceConstrainedGenericType_1__ctor_m11 function, we can see that this method is the #define operator associated with the InterfaceConstrainedGenericType_1__ctor_m10456_gshared function. Below is the implementation of the InterfaceConstrainedGenericType_1_FindTheAnswer_m10458_gshared function, which takes an Object_t * argument and is also completely generic. Calling the InterfaceFuncInvoker0 :: Invoke function allows you to make a call to the ComputeAnswer managed method.

 extern "C" int32_t InterfaceConstrainedGenericType_1_FindTheAnswer_m10458_gshared (InterfaceConstrainedGenericType_1_t2160 * __this, Object_t * ___experiment, MethodInfo* method) { static bool s_Il2CppMethodIntialized; if (!s_Il2CppMethodIntialized) { AnswerFinderInterface_t11_il2cpp_TypeInfo_var = il2cpp_codegen_class_from_type(&AnswerFinderInterface_t11_0_0_0); s_Il2CppMethodIntialized = true; } { int32_t L_0 = (int32_t)InterfaceFuncInvoker0<int32_t>::Invoke(0 /* System.Int32 HelloWorld/AnswerFinderInterface::ComputeAnswer() */, AnswerFinderInterface_t11_il2cpp_TypeInfo_var, (Object_t *)(*(&amp;___experiment))); return L_0; } }

It is important to remember that IL2CPP treats any managed interface as System.Object. This rule is suitable for any code generated by the il2cpp.exe utility.

Base class restrictions

In addition to the limitations of the interface, C # admits the existence of restrictions of the base class. But if IL2CPP does not treat base classes like System.Object, how does the generic implementation work in this case?

Since base classes are always reference types, IL2CPP uses fully generalized methods for them. In any code that uses a field or calls a method for a limited type, the type is coded in C ++. Again, the C # compiler provides the correct implementation of the generalized constraint, and we cheat the C ++ compiler on the type.

Generic Value Type Implementation

Let's go back to the HelloWorld_DemonstrateGenericSharing_m4 function and take a look at the implementation of GenericType. DateTime is reference type, so GenericType is not generic. Let's proceed to the declaration of this type of constructor, GenericType_1__ctor_m10. Here, as in other cases, we see #define, but it is associated with the GenericType_1__ctor_m10_gshared function used by only one class — GenericType.

Conceptual understanding of the generalized implementation

The concept of a generalized implementation can be quite difficult to understand. The subject area is full of pathological cases (the same recursive patterns). Therefore, here it is necessary to highlight several basic principles:

The implementation of any method for a generic type is generic.
In some cases, the implementation of the methods is generalized only for a specific type (for example, the aforementioned type with a generic value type parameter GenericType).
Types with a generic parameter of a reference type use a fully generic implementation, treating parameters of all types as System.Object.
Types with parameters of two or more types can be partially generalized if at least one of the types of parameters is a reference.

For any generic type, the il2cpp.exe utility always generates fully generic method implementations. Other implementations are generated only if necessary.

Generalized methods

The generic implementation is used not only for generic types, but also for generic methods. Notice that in the source script code, the UsesDifferentGenericParameter method uses a parameter of a different type than the GenericType class. But when considering the generic implementation for the GenericType class, we did not see this method. Entering the “UsesDifferentGenericParameter” into the search, we see that the implementation of this method is in the GenericMethods0.cpp file:

 extern "C" Object_t * GenericType_1_UsesDifferentGenericParameter_TisObject_t_m15243_gshared (GenericType_1_t2159 * __this, Object_t * ___value, MethodInfo* method) { { Object_t * L_0 = ___value; return L_0; } }

This is a fully generic implementation that accepts the Object_t * type. And although this method is of a generalized type, the behavior would be the same for the non-generalized. It can be argued that il2cpp.exe always tries to generate the minimum amount of code to implement methods with generic parameters.

Conclusion

A generic implementation is one of the most important improvements in IL2CPP since its release, which makes it possible to significantly reduce the size of C ++ code for implementations of methods with the same behavior. We continue to look for solutions to reduce the size of binary files and try to use more advantages and opportunities of a generalized implementation.

In the next article we will talk about the generation of p / invoke wrappers, as well as the types marshaling between managed and unmanaged code.

Source: https://habr.com/ru/post/345736/

All Articles