📜 ⬆️ ⬇️

Optimization history of one IoC container

In this post, I would like to share information about a small, but, in my opinion, very, very useful project , in which Stefán Jökull Sigurðarson adds all the IoC containers known to him that have migrated to .NET Core, and using BenchmarkDotNet , measures the instance resolving performance. I did not miss the opportunity to participate in this competition and I am with my small project FsContainer .


image


1.2.0


After the project was migrated to .NET Core (I want to note that it turned out to be absolutely not difficult), to say that I was not discouraged, so to say nothing, and this was due to the fact that one of the three measurements my container did not pass. In the literal meaning of this word, the measurement simply lasted over 20 minutes and did not end.


The reason was in this section of the code:


public object Resolve(Type type) { var instance = _bindingResolver.Resolve(this, GetBindings(), type); if (!_disposeManager.Contains(instance)) { _disposeManager.Add(instance); } return instance; } 

If you think about it, the basic principle of the work of benchmarks is the measurement of the number of operations performed per unit of time (optional memory consumption), which means that the Resolve method is run as many times as possible. You may notice that after the resolve, the resulting instance is added to the _disposeManager for further destruction in the case of container.Dispose() . Since inside the implementation is a List<object> , instances in which are added by checking on Contains , then you can guess that there are 2 side-effects at once:


  1. Each new instance created using the Contains check will compute GetHashCode and search for a duplicate among previously added ones;
  2. Since each new instance created will always be unique (resolve tested with TransientLifetimeManager tested), then the size of the List<object> will constantly increase by allocating a new, twice larger memory area and copying previously added elements into it (to add a million copies of the memory allocation operation) and copies will be called at least 20 times);

Frankly, I'm not sure which solution is the most correct in this case, because in real life it’s hard for me to imagine when one container will hold millions of references to previously created instances, so I solved only half of the problem, adding a (quite logical) restriction to Adding to the _disposeManager only those objects that implement IDisposable .


 if (instance is IDisposable && !_disposeManager.Contains(instance)) { _disposeManager.Add(instance); } 

As a result, the measurement was completed in a fairly reasonable time and gave the following results:


MethodMeanErrorStddevScaledScaledSDGen 0Gen 1Allocated
Direct13.77 ns0.3559 ns0.3655 ns1.000.000.0178-56 B
Lightinject36.95 ns0.1081 ns0.0902 ns2.690.070.0178-56 B
Simpleinjector46.17 ns0.2746 ns0.2434 ns3.350.090.0178-56 B
AspNetCore71.09 ns0.4592 ns0.4296 ns5.170.140.0178-56 B
Autofac1,600.67 ns14.4742 ns12.8310 ns116.323.100.5741-1803 B
Structuremap1,815.87 ns18.2271 ns16.1578 ns131.953.550.6294-1978 B
Fscontainer2,819.01 ns6.0161 ns5.3331 ns204.855.240.4845-1524 B
Ninject12,812.70 ns255.5191 ns447.5211 ns931.0639.951.78530.44255767 B

I, of course, did not become pleased with them and began to search for further ways of optimization.


1.2.1


In the current version of the container, the definition of the required constructor and the arguments required for it is unchanged, therefore, this information can be cached and henceforth do not waste processor time. The result of this optimization is the addition of ConcurrentDictionary , the key of which is the requested type ( Resolve<T> ), and the values ​​are the constructor and the arguments that will be used to create the instance directly.


 private readonly IDictionary<Type, Tuple<ConstructorInfo, ParameterInfo[]>> _ctorCache = new ConcurrentDictionary<Type, Tuple<ConstructorInfo, ParameterInfo[]>>(); 

Judging by the measurements, such a simple operation increased productivity by more than 30%:


MethodMeanErrorStddevScaledScaledSDGen 0Gen 1Gen 2Allocated
Direct13.50 ns0.2240 ns0.1986 ns1.000.000.0178--56 B
Lightinject36.94 ns0.0999 ns0.0886 ns2.740.040.0178--56 B
Simpleinjector46.40 ns0.3409 ns0.3189 ns3.440.050.0178--56 B
AspNetCore70.26 ns0.4897 ns0.4581 ns5.210.080.0178--56 B
Autofac1,634.89 ns15.3160 ​​ns14.3266 ns121.142.010.5741--1803 B
Fscontainer1,779.12 ns18.9507 ns17.7265 ns131.832.270.2441--774 B
Structuremap1,830.01 ns5.4174 ns4.8024 ns135.601.970.6294--1978 B
Ninject12,558.59 ns268.1920 ns490.4042 ns930.5838.291.78580.44230.00055662 B

1.2.2


By taking measurements, BenchmarkDotNet notifies the user that this or that assembly may not be optimized (compiled in a Debug configuration). I couldn’t understand for a long time why this message was highlighted in the project, where the container was connected via nuget package and what my surprise was when I saw a possible list of parameters for nuget pack:


 nuget pack MyProject.csproj -properties Configuration=Release 

It turns out that all this time I was building a package in the Debug configuration, which judging by the updated measurement results, slowed down the performance by as much as 25%.


MethodMeanErrorStddevScaledScaledSDGen 0Gen 1Gen 2Allocated
Direct13.38 ns0.2216 ns0.2073 ns1.000.000.0178--56 B
Lightinject36.85 ns0.0577 ns0.0511 ns2.750.040.0178--56 B
Simpleinjector46.56 ns0.5329 ns0.4724 ns3.480.060.0178--56 B
AspNetCore70.17 ns0.1403 ns0.1312 ns5.250.080.0178--56 B
Fscontainer1,271.81 ns4.0828 ns3.8190 ns95.091.440.2460--774 B
Autofac1,648.52 ns2.3197 ns2.0563 ns123.261.840.5741--1803 B
Structuremap1,829.05 ns17.8238 ns16.6724 ns136.752.370.6294--1978 B
Ninject12,520.08 ns248.2530 ns534.3907 ns936.1041.981.78600.44230.00085662 B

1.2.3


Another optimization was the caching of the activator function, which is compiled using Expression:


 private readonly IDictionary<Type, Func<object[], object>> _activatorCache = new ConcurrentDictionary<Type, Func<object[], object>>(); 

The universal function takes as arguments the ConstructorInfo and the argument array ParameterInfo[] , and returns a strongly typed lambda as the result:


 private Func<object[], object> GetActivator(ConstructorInfo ctor, ParameterInfo[] parameters) { var p = Expression.Parameter(typeof(object[]), "args"); var args = new Expression[parameters.Length]; for (var i = 0; i < parameters.Length; i++) { var a = Expression.ArrayAccess(p, Expression.Constant(i)); args[i] = Expression.Convert(a, parameters[i].ParameterType); } var b = Expression.New(ctor, args); var l = Expression.Lambda<Func<object[], object>>(b, p); return l.Compile(); } 

I agree that a logical continuation of this solution should be to compile the entire Resolve function, not just the Activator, but even in the current implementation this introduced a 10% acceleration, thus allowing to take a confident 5th place:


MethodMeanErrorStddevScaledScaledSDGen 0Gen 1Gen 2Allocated
Direct13.24 ns0.0836 ns0.0698 ns1.000.000.0178--56 B
Lightinject37.39 ns0.0570 ns0.0533 ns2.820.010.0178--56 B
Simpleinjector46.22 ns0.2327 ns0.2063 ns3.490.020.0178--56 B
AspNetCore70.53 ns0.2885 ns0.2698 ns5.330.030.0178--56 B
Fscontainer1,038.13 ns17.1037 ns15.9988 ns78.411.230.2327--734 B
Autofac1,551.33 ns3.6293 ns3.2173 ns117.170.640.5741--1803 B
Structuremap1,944.35 ns1.8665 ns1.7459 ns146.850.760.6294--1978 B
Ninject13,139.70 ns260.8754 ns508.8174 ns992.4338.351.78570.44250.00045682 B

1.2.4


Already after the publication of the article, @turbanoff noticed that in the case of ConcurrentDictionary performance of the GetOrAdd method GetOrAdd higher than that of ContainsKey / Add, for which he thanks a GetOrAdd . The results of the measurements are presented below:


Before:


 if (!_activatorCache.ContainsKey(concrete)) { _activatorCache[concrete] = GetActivator(ctor, parameters); } 

MethodMeanErrorStddevMedianGen 0Allocated
ResolveSingleton299.0 ns7.239 ns19.45 ns295.7 ns0.1268199 B
ResolveTransient686.3 ns32.333 ns86.30 ns668.7 ns0.2079327 B
ResolveCombined1,487.4 ns101.057 ns273.21 ns1,388.7 ns0.4673734 B

After:


 var activator = _activatorCache.GetOrAdd(concrete, x => GetActivator(ctor, parameters)); 

MethodMeanErrorStddevGen 0Allocated
ResolveSingleton266.6 ns4.955 ns4.393 ns0.1268199 B
ResolveTransient512.0 ns16.974 ns16.671 ns0.3252511 B
ResolveCombined1,119.2 ns18.218 ns15.213 ns0.69431101 B

PS


As an experiment, I decided to measure the time of creation of objects using different designs. The project itself is available on Github , and you can see the results below. For the sake of completeness, only the activation method is missing through the generation of IL instructions as close as possible to the Direct method. It is this method that uses containers from the top 4, which allows them to achieve such impressive results.


MethodMeanErrorStddevGen 0Allocated
Direct4.031 ns0.1588 ns0.1890 ns0.007624 B
Compiledinvoke85.541 ns0.5319 ns0.4715 ns0.017856 B
Constructorinfoinvoke316.088 ns1.8337 ns1.6256 ns0.027788 B
ActivatorCreateInstance727.547 ns2.9228 ns2.5910 ns0.1316416 B
Dynamic invoke974.699 ns5.5867 ns5.2258 ns0.0515168 B

')

Source: https://habr.com/ru/post/331584/


All Articles