In this post, I would like to share information about a small, but, in my opinion, very, very useful project , in which Stefán Jökull Sigurðarson adds all the IoC containers known to him that have migrated to .NET Core, and using BenchmarkDotNet , measures the instance resolving performance. I did not miss the opportunity to participate in this competition and I am with my small project FsContainer .
After the project was migrated to .NET Core (I want to note that it turned out to be absolutely not difficult), to say that I was not discouraged, so to say nothing, and this was due to the fact that one of the three measurements my container did not pass. In the literal meaning of this word, the measurement simply lasted over 20 minutes and did not end.
The reason was in this section of the code:
public object Resolve(Type type) { var instance = _bindingResolver.Resolve(this, GetBindings(), type); if (!_disposeManager.Contains(instance)) { _disposeManager.Add(instance); } return instance; }
If you think about it, the basic principle of the work of benchmarks is the measurement of the number of operations performed per unit of time (optional memory consumption), which means that the Resolve
method is run as many times as possible. You may notice that after the resolve, the resulting instance is added to the _disposeManager
for further destruction in the case of container.Dispose()
. Since inside the implementation is a List<object>
, instances in which are added by checking on Contains
, then you can guess that there are 2 side-effects at once:
Contains
check will compute GetHashCode
and search for a duplicate among previously added ones;TransientLifetimeManager
tested), then the size of the List<object>
will constantly increase by allocating a new, twice larger memory area and copying previously added elements into it (to add a million copies of the memory allocation operation) and copies will be called at least 20 times);Frankly, I'm not sure which solution is the most correct in this case, because in real life it’s hard for me to imagine when one container will hold millions of references to previously created instances, so I solved only half of the problem, adding a (quite logical) restriction to Adding to the _disposeManager
only those objects that implement IDisposable
.
if (instance is IDisposable && !_disposeManager.Contains(instance)) { _disposeManager.Add(instance); }
As a result, the measurement was completed in a fairly reasonable time and gave the following results:
Method | Mean | Error | Stddev | Scaled | ScaledSD | Gen 0 | Gen 1 | Allocated |
---|---|---|---|---|---|---|---|---|
Direct | 13.77 ns | 0.3559 ns | 0.3655 ns | 1.00 | 0.00 | 0.0178 | - | 56 B |
Lightinject | 36.95 ns | 0.1081 ns | 0.0902 ns | 2.69 | 0.07 | 0.0178 | - | 56 B |
Simpleinjector | 46.17 ns | 0.2746 ns | 0.2434 ns | 3.35 | 0.09 | 0.0178 | - | 56 B |
AspNetCore | 71.09 ns | 0.4592 ns | 0.4296 ns | 5.17 | 0.14 | 0.0178 | - | 56 B |
Autofac | 1,600.67 ns | 14.4742 ns | 12.8310 ns | 116.32 | 3.10 | 0.5741 | - | 1803 B |
Structuremap | 1,815.87 ns | 18.2271 ns | 16.1578 ns | 131.95 | 3.55 | 0.6294 | - | 1978 B |
Fscontainer | 2,819.01 ns | 6.0161 ns | 5.3331 ns | 204.85 | 5.24 | 0.4845 | - | 1524 B |
Ninject | 12,812.70 ns | 255.5191 ns | 447.5211 ns | 931.06 | 39.95 | 1.7853 | 0.4425 | 5767 B |
I, of course, did not become pleased with them and began to search for further ways of optimization.
In the current version of the container, the definition of the required constructor and the arguments required for it is unchanged, therefore, this information can be cached and henceforth do not waste processor time. The result of this optimization is the addition of ConcurrentDictionary
, the key of which is the requested type ( Resolve<T>
), and the values are the constructor and the arguments that will be used to create the instance directly.
private readonly IDictionary<Type, Tuple<ConstructorInfo, ParameterInfo[]>> _ctorCache = new ConcurrentDictionary<Type, Tuple<ConstructorInfo, ParameterInfo[]>>();
Judging by the measurements, such a simple operation increased productivity by more than 30%:
Method | Mean | Error | Stddev | Scaled | ScaledSD | Gen 0 | Gen 1 | Gen 2 | Allocated |
---|---|---|---|---|---|---|---|---|---|
Direct | 13.50 ns | 0.2240 ns | 0.1986 ns | 1.00 | 0.00 | 0.0178 | - | - | 56 B |
Lightinject | 36.94 ns | 0.0999 ns | 0.0886 ns | 2.74 | 0.04 | 0.0178 | - | - | 56 B |
Simpleinjector | 46.40 ns | 0.3409 ns | 0.3189 ns | 3.44 | 0.05 | 0.0178 | - | - | 56 B |
AspNetCore | 70.26 ns | 0.4897 ns | 0.4581 ns | 5.21 | 0.08 | 0.0178 | - | - | 56 B |
Autofac | 1,634.89 ns | 15.3160 ns | 14.3266 ns | 121.14 | 2.01 | 0.5741 | - | - | 1803 B |
Fscontainer | 1,779.12 ns | 18.9507 ns | 17.7265 ns | 131.83 | 2.27 | 0.2441 | - | - | 774 B |
Structuremap | 1,830.01 ns | 5.4174 ns | 4.8024 ns | 135.60 | 1.97 | 0.6294 | - | - | 1978 B |
Ninject | 12,558.59 ns | 268.1920 ns | 490.4042 ns | 930.58 | 38.29 | 1.7858 | 0.4423 | 0.0005 | 5662 B |
By taking measurements, BenchmarkDotNet notifies the user that this or that assembly may not be optimized (compiled in a Debug configuration). I couldn’t understand for a long time why this message was highlighted in the project, where the container was connected via nuget package and what my surprise was when I saw a possible list of parameters for nuget pack:
nuget pack MyProject.csproj -properties Configuration=Release
It turns out that all this time I was building a package in the Debug configuration, which judging by the updated measurement results, slowed down the performance by as much as 25%.
Method | Mean | Error | Stddev | Scaled | ScaledSD | Gen 0 | Gen 1 | Gen 2 | Allocated |
---|---|---|---|---|---|---|---|---|---|
Direct | 13.38 ns | 0.2216 ns | 0.2073 ns | 1.00 | 0.00 | 0.0178 | - | - | 56 B |
Lightinject | 36.85 ns | 0.0577 ns | 0.0511 ns | 2.75 | 0.04 | 0.0178 | - | - | 56 B |
Simpleinjector | 46.56 ns | 0.5329 ns | 0.4724 ns | 3.48 | 0.06 | 0.0178 | - | - | 56 B |
AspNetCore | 70.17 ns | 0.1403 ns | 0.1312 ns | 5.25 | 0.08 | 0.0178 | - | - | 56 B |
Fscontainer | 1,271.81 ns | 4.0828 ns | 3.8190 ns | 95.09 | 1.44 | 0.2460 | - | - | 774 B |
Autofac | 1,648.52 ns | 2.3197 ns | 2.0563 ns | 123.26 | 1.84 | 0.5741 | - | - | 1803 B |
Structuremap | 1,829.05 ns | 17.8238 ns | 16.6724 ns | 136.75 | 2.37 | 0.6294 | - | - | 1978 B |
Ninject | 12,520.08 ns | 248.2530 ns | 534.3907 ns | 936.10 | 41.98 | 1.7860 | 0.4423 | 0.0008 | 5662 B |
Another optimization was the caching of the activator function, which is compiled using Expression:
private readonly IDictionary<Type, Func<object[], object>> _activatorCache = new ConcurrentDictionary<Type, Func<object[], object>>();
The universal function takes as arguments the ConstructorInfo
and the argument array ParameterInfo[]
, and returns a strongly typed lambda as the result:
private Func<object[], object> GetActivator(ConstructorInfo ctor, ParameterInfo[] parameters) { var p = Expression.Parameter(typeof(object[]), "args"); var args = new Expression[parameters.Length]; for (var i = 0; i < parameters.Length; i++) { var a = Expression.ArrayAccess(p, Expression.Constant(i)); args[i] = Expression.Convert(a, parameters[i].ParameterType); } var b = Expression.New(ctor, args); var l = Expression.Lambda<Func<object[], object>>(b, p); return l.Compile(); }
I agree that a logical continuation of this solution should be to compile the entire Resolve function, not just the Activator, but even in the current implementation this introduced a 10% acceleration, thus allowing to take a confident 5th place:
Method | Mean | Error | Stddev | Scaled | ScaledSD | Gen 0 | Gen 1 | Gen 2 | Allocated |
---|---|---|---|---|---|---|---|---|---|
Direct | 13.24 ns | 0.0836 ns | 0.0698 ns | 1.00 | 0.00 | 0.0178 | - | - | 56 B |
Lightinject | 37.39 ns | 0.0570 ns | 0.0533 ns | 2.82 | 0.01 | 0.0178 | - | - | 56 B |
Simpleinjector | 46.22 ns | 0.2327 ns | 0.2063 ns | 3.49 | 0.02 | 0.0178 | - | - | 56 B |
AspNetCore | 70.53 ns | 0.2885 ns | 0.2698 ns | 5.33 | 0.03 | 0.0178 | - | - | 56 B |
Fscontainer | 1,038.13 ns | 17.1037 ns | 15.9988 ns | 78.41 | 1.23 | 0.2327 | - | - | 734 B |
Autofac | 1,551.33 ns | 3.6293 ns | 3.2173 ns | 117.17 | 0.64 | 0.5741 | - | - | 1803 B |
Structuremap | 1,944.35 ns | 1.8665 ns | 1.7459 ns | 146.85 | 0.76 | 0.6294 | - | - | 1978 B |
Ninject | 13,139.70 ns | 260.8754 ns | 508.8174 ns | 992.43 | 38.35 | 1.7857 | 0.4425 | 0.0004 | 5682 B |
Already after the publication of the article, @turbanoff noticed that in the case of ConcurrentDictionary
performance of the GetOrAdd
method GetOrAdd
higher than that of ContainsKey / Add, for which he thanks a GetOrAdd
. The results of the measurements are presented below:
Before:
if (!_activatorCache.ContainsKey(concrete)) { _activatorCache[concrete] = GetActivator(ctor, parameters); }
Method | Mean | Error | Stddev | Median | Gen 0 | Allocated |
---|---|---|---|---|---|---|
ResolveSingleton | 299.0 ns | 7.239 ns | 19.45 ns | 295.7 ns | 0.1268 | 199 B |
ResolveTransient | 686.3 ns | 32.333 ns | 86.30 ns | 668.7 ns | 0.2079 | 327 B |
ResolveCombined | 1,487.4 ns | 101.057 ns | 273.21 ns | 1,388.7 ns | 0.4673 | 734 B |
After:
var activator = _activatorCache.GetOrAdd(concrete, x => GetActivator(ctor, parameters));
Method | Mean | Error | Stddev | Gen 0 | Allocated |
---|---|---|---|---|---|
ResolveSingleton | 266.6 ns | 4.955 ns | 4.393 ns | 0.1268 | 199 B |
ResolveTransient | 512.0 ns | 16.974 ns | 16.671 ns | 0.3252 | 511 B |
ResolveCombined | 1,119.2 ns | 18.218 ns | 15.213 ns | 0.6943 | 1101 B |
As an experiment, I decided to measure the time of creation of objects using different designs. The project itself is available on Github , and you can see the results below. For the sake of completeness, only the activation method is missing through the generation of IL instructions as close as possible to the Direct method. It is this method that uses containers from the top 4, which allows them to achieve such impressive results.
Method | Mean | Error | Stddev | Gen 0 | Allocated |
---|---|---|---|---|---|
Direct | 4.031 ns | 0.1588 ns | 0.1890 ns | 0.0076 | 24 B |
Compiledinvoke | 85.541 ns | 0.5319 ns | 0.4715 ns | 0.0178 | 56 B |
Constructorinfoinvoke | 316.088 ns | 1.8337 ns | 1.6256 ns | 0.0277 | 88 B |
ActivatorCreateInstance | 727.547 ns | 2.9228 ns | 2.5910 ns | 0.1316 | 416 B |
Dynamic invoke | 974.699 ns | 5.5867 ns | 5.2258 ns | 0.0515 | 168 B |
Source: https://habr.com/ru/post/331584/
All Articles