Optimization and Generics in the CLR

In this article, John Skeet will describe how the simplest language constructs slow down your program and how you can speed it up.

As in any job that is linked to application performance, the result may vary depending on conditions (in particular, for example, a 64-bit JIT may work a little differently), and in most cases this should not worry you. Despite this, a relatively small number of developers write a production code consisting of a large number of micro-optimizations. Therefore, please do not accept this post as a call to complicate the code for the sake of irrational optimization, which supposedly speeds up your program. Use it only where it may actually be needed.

New () constraint

Suppose, narpimer, we have the type SteppedPattern (the author discusses optimization using the example of his library, Noda Time , approx. Transl.), Which has a generic type TBucket . I note only that it is important that before I parse the value, I want to create a new object of the TBucket class. The idea is that the nuggets of information are added to the Bucket , where they parse. And after the end of the operation, they add up to ParseResult . So each string parsing operation requires the creation of a TBucket instance. How can we create them in the case of Generic types?
')
We can do this by calling the type constructor without parameters. I don’t want to think about whether the passed types have such a constructor, so I’ll just add a new() constraint and call new TBucket() .

 // Somewhat simplified... internal sealed class SteppedPattern<TResult, TBucket> : IParsePattern<TResult>    where TBucket : new() {    public ParseResult<TResult> Parse(string value)   {       TBucket bucket = new TBucket();        // Rest of parsing goes here   } }

Sumptuously! Quite simple. However, unfortunately, I lost sight of the fact that this single line of code will take us 75% of the time it takes to parse a line. And this is just the creation of an empty Bucket - the simplest class that parses the simplest line! When I understood it, it shocked me.

Fix it using the provider

Our fix will be very simple. We just need to tell our type how to create an instance of the object. We do this with a delegate:

 // Somewhat simplified... internal sealed class SteppedPattern<TResult, TBucket> : IParsePattern<TResult> {    private readonly Func<TBucket> bucketProvider;    internal SteppedPattern(Func<TBucket> bucketProvider)   {        this.bucketProvider = bucketProvider;   }    public ParseResult<TResult> Parse(string value)   {       TBucket bucket = bucketProvider();        // Rest of parsing goes here   } }

Now I can call new StoppedPattern(() => new OffsetBucket()) , or something like that. It also means that I can leave the costructor as internal and never take care of it again. And, which will further simplify the writing of subsequent code, I could even use the old Bucket to parse subsequent lines.

I want signs!

It seems to me that not everyone wants to run the tests on their own, and more want to look at the finished results. So I decided to give the results of the benchmarks, which I did to check only the time for creating Generic types. In order to show how insignificant these results will be, I will indicate that the values recorded in the table are measured in milliseconds. And during this time 100 million operations have been performed, which we will test. Therefore, unless your code is based on a frequent reference to the operation of creating generic types, this should not make you want to rewrite the code. However, remember this for the future.

Anyway, our code is designed to work with four types: two classes and two structures. And for each of them - with a small and large version (meaning, apparently, small and large versions for the GAC , smaller and larger than 85K), on 32-bit and 64-bit machines, for CLR v2 , v4. My 64-bit machine is faster by itself, so it’s necessary to compare the results inside one machine.

CLR v4: 32-bit results (ms per 100 million iterations)

Test type	new () constraint	Provider delegate
Small struct	689	1225
Large struct	11188	7273
Small class	16307	1690
Large class	17471	3017

CLR v4: 64-bit results (ms per 100 million iterations)

Test type	new () constraint	Provider delegate
Small struct	473	868
Large struct	2670	2396
Small class	8366	1189
Large class	8805	1529

CLR v2: 32-bit results (ms per 100 million iterations)

Test type	new () constraint	Provider delegate
Small struct	703	1246
Large struct	11411	7392
Small class	143967	1791
Large class	143107	2581

CLR v2: 64-bit results (ms per 100 million iterations)

Test type	new () constraint	Provider delegate
Small struct	510	686
Large struct	2334	1731
Small class	81801	1539
Large class	83293	1896

Look at the results for the classes. These are real results - they take about 2 minutes on my laptop when using the new() restriction and only a couple of seconds when using the provider. And, which is very important to note, these results are relevant for .Net 2.0 (meaning CLR , and version 2.0 is most likely written to surprise the reader with the fact that, up to .Net 3.5 everything works on CLR v2 , for .Net 2.0 ).

And, of course, you can download the benchmark to see and see how it works on your machine.

What happens under the hood?

As far as I understand, there is no IL instruction to support the new() constraint. Instead, the compiler inserts instructions for calling Activator.CreateInstance [T] . Obviously, this is in any case slower than calling the delegate, since In this case, we are trying to find a suitable constructor for us through reflection and call it. I was really surprised that this was not optimized. After all, the obvious solution is to use delegates and cache them for future use. I will not raise debates on their decision, because in the end their decision does not consume additional memory that the cache will occupy.

I want more benchmarks !!

(taken from the second part of the article)

Here we look at the performance of work with delegates. And also try to speed them up.
You can download the full source code for performance testing from my site . Here, as a matter of fact, I do similar actions every time I write a test. I create an Action delegate that does nothing and check that the reference to it is not null. I only do this to avoid JIT optimizations. Each test is implemented as a generic method that accepts one Generic parameter. I call each method two times: the first time I pass the Int32 argument, and the second, String . Also included a few cases:

I use lambda expressions: Action foo = () => ();

  private static void Lambda<T>() { Action foo = () => {}; if (foo == null) { throw new Exception(); } }

What I would like the compiler to do for me: a separate cache that stores a delegate for instantiating a class.

  private static void FakeCachedLambda<T>() { if (FakeLambdaCache<T>.CachedAction == null) { FakeLambdaCache<T>.CachedAction = FakeLambdaCache<T>.NoOp; } Action foo = FakeLambdaCache<T>.CachedAction; if (foo == null) { throw new Exception(); } } private static class FakeLambdaCache<T> { internal static Action CachedAction; internal static void NoOp() {} }

What the compiler does in reality with a lambda expression: we will write a separate generic method, and we will do a method group conversion
```
  private static void GenericMethodGroup<T>() { Action foo = NoOp<T>; if (foo == null) { throw new Exception(); } } 
```

What the compiler could do: use a separate non- generic method to subsequently apply method group conversion

  private static void NonGenericMethodGroup<T>() { Action foo = NoOp; if (foo == null) { throw new Exception(); } }

Using method group conversion in the static non- generic method of the generic type;

  private static void StaticMethodOnGenericType<T>() { Action foo = SampleGenericClass<T>.NoOpStatic; if (foo == null) { throw new Exception(); } }

Using method group conversion in a non-static, non-generic method of a generic type, using a generic cache class with a single field pointing to an instance of the generic class.
Yes, the latter looks somewhat convoluted, however it looks much simpler:
```
  private static void InstanceMethodOnGenericType<T>() { Action foo = ClassHolder<T>.SampleInstance.NoOpInstance; if (foo == null) { throw new Exception(); } } 
```

Also I will reveal all unsolved definitions:

  private static void NoOp() {} private static void NoOp<T>() {} private class ClassHolder<T> { internal static SampleGenericClass<T> SampleInstance = new SampleGenericClass<T>(); } private class SampleGenericClass<T> { internal static void NoOpStatic() { } internal void NoOpInstance() { } }

Notice that we do all this in a generic method, and call it for each type: Int32 and String . And what is important to note is that we do not capture any variables, and the generic parameter does not participate in any part of the implementation of the method body.

Test results

Again, the results are presented in milliseconds, on 10 million operations. I do not want to run them on 100 million operations, because it will be very slow. Also I will clarify that testing was performed on x64 JIT

Test	TestCase [int]	TestCase [string]
Lambda expression	180	29684
Generic cache class	90	288
Generic method group conversion	184	30017
Non-generic method group conversion	178	189
Static method on generic type	180	29276
Instance method on generic type	202	299

Yes, creating a delegate for a generic method, with a reference type as a generic parameter, is 150 times slower than for a value-type as a generic parameter. And it seems that I was the first to know about it. It would, of course, be very interesting to hear the answer from the CLR team blog ...

findings

I would never have found this reef if only I had any tests. The lesson to be learned from this message is to never use the new() restriction if your goal is application performance and if your code relies on a large number of operations to allocate new objects in Generic types.

One of the most difficult questions on which it is difficult to know the exact answer is what the compiler will do with a lambda expression. In our version, the compiler doesn’t care much about performance, and we’ll have to take care of it ourselves.

Source: https://habr.com/ru/post/144193/

All Articles