📜 ⬆️ ⬇️

Optimization and Generics in the CLR

In this article, John Skeet will describe how the simplest language constructs slow down your program and how you can speed it up.

As in any job that is linked to application performance, the result may vary depending on conditions (in particular, for example, a 64-bit JIT may work a little differently), and in most cases this should not worry you. Despite this, a relatively small number of developers write a production code consisting of a large number of micro-optimizations. Therefore, please do not accept this post as a call to complicate the code for the sake of irrational optimization, which supposedly speeds up your program. Use it only where it may actually be needed.


New () constraint


Suppose, narpimer, we have the type SteppedPattern (the author discusses optimization using the example of his library, Noda Time , approx. Transl.), Which has a generic type TBucket . I note only that it is important that before I parse the value, I want to create a new object of the TBucket class. The idea is that the nuggets of information are added to the Bucket , where they parse. And after the end of the operation, they add up to ParseResult . So each string parsing operation requires the creation of a TBucket instance. How can we create them in the case of Generic types?
')
We can do this by calling the type constructor without parameters. I don’t want to think about whether the passed types have such a constructor, so I’ll just add a new() constraint and call new TBucket() .

 // Somewhat simplified... internal sealed class SteppedPattern<TResult, TBucket> : IParsePattern<TResult>    where TBucket : new() {    public ParseResult<TResult> Parse(string value)   {       TBucket bucket = new TBucket();        // Rest of parsing goes here   } } 


Sumptuously! Quite simple. However, unfortunately, I lost sight of the fact that this single line of code will take us 75% of the time it takes to parse a line. And this is just the creation of an empty Bucket - the simplest class that parses the simplest line! When I understood it, it shocked me.

Fix it using the provider

Our fix will be very simple. We just need to tell our type how to create an instance of the object. We do this with a delegate:
 // Somewhat simplified... internal sealed class SteppedPattern<TResult, TBucket> : IParsePattern<TResult> {    private readonly Func<TBucket> bucketProvider;    internal SteppedPattern(Func<TBucket> bucketProvider)   {        this.bucketProvider = bucketProvider;   }    public ParseResult<TResult> Parse(string value)   {       TBucket bucket = bucketProvider();        // Rest of parsing goes here   } } 


Now I can call new StoppedPattern(() => new OffsetBucket()) , or something like that. It also means that I can leave the costructor as internal and never take care of it again. And, which will further simplify the writing of subsequent code, I could even use the old Bucket to parse subsequent lines.

I want signs!

It seems to me that not everyone wants to run the tests on their own, and more want to look at the finished results. So I decided to give the results of the benchmarks, which I did to check only the time for creating Generic types. In order to show how insignificant these results will be, I will indicate that the values ​​recorded in the table are measured in milliseconds. And during this time 100 million operations have been performed, which we will test. Therefore, unless your code is based on a frequent reference to the operation of creating generic types, this should not make you want to rewrite the code. However, remember this for the future.

Anyway, our code is designed to work with four types: two classes and two structures. And for each of them - with a small and large version (meaning, apparently, small and large versions for the GAC , smaller and larger than 85K), on 32-bit and 64-bit machines, for CLR v2 , v4. My 64-bit machine is faster by itself, so it’s necessary to compare the results inside one machine.

CLR v4: 32-bit results (ms per 100 million iterations)
Test typenew () constraintProvider delegate
Small struct6891225
Large struct111887273
Small class163071690
Large class174713017


CLR v4: 64-bit results (ms per 100 million iterations)
Test typenew () constraintProvider delegate
Small struct473868
Large struct26702396
Small class83661189
Large class88051529


CLR v2: 32-bit results (ms per 100 million iterations)
Test typenew () constraintProvider delegate
Small struct7031246
Large struct114117392
Small class1439671791
Large class1431072581


CLR v2: 64-bit results (ms per 100 million iterations)
Test typenew () constraintProvider delegate
Small struct510686
Large struct23341731
Small class818011539
Large class832931896


Look at the results for the classes. These are real results - they take about 2 minutes on my laptop when using the new() restriction and only a couple of seconds when using the provider. And, which is very important to note, these results are relevant for .Net 2.0 (meaning CLR , and version 2.0 is most likely written to surprise the reader with the fact that, up to .Net 3.5 everything works on CLR v2 , for .Net 2.0 ).

And, of course, you can download the benchmark to see and see how it works on your machine.

What happens under the hood?

As far as I understand, there is no IL instruction to support the new() constraint. Instead, the compiler inserts instructions for calling Activator.CreateInstance [T] . Obviously, this is in any case slower than calling the delegate, since In this case, we are trying to find a suitable constructor for us through reflection and call it. I was really surprised that this was not optimized. After all, the obvious solution is to use delegates and cache them for future use. I will not raise debates on their decision, because in the end their decision does not consume additional memory that the cache will occupy.

I want more benchmarks !!


(taken from the second part of the article)

Here we look at the performance of work with delegates. And also try to speed them up.
You can download the full source code for performance testing from my site . Here, as a matter of fact, I do similar actions every time I write a test. I create an Action delegate that does nothing and check that the reference to it is not null. I only do this to avoid JIT optimizations. Each test is implemented as a generic method that accepts one Generic parameter. I call each method two times: the first time I pass the Int32 argument, and the second, String . Also included a few cases:


Also I will reveal all unsolved definitions:
  private static void NoOp() {} private static void NoOp<T>() {} private class ClassHolder<T> { internal static SampleGenericClass<T> SampleInstance = new SampleGenericClass<T>(); } private class SampleGenericClass<T> { internal static void NoOpStatic() { } internal void NoOpInstance() { } } 


Notice that we do all this in a generic method, and call it for each type: Int32 and String . And what is important to note is that we do not capture any variables, and the generic parameter does not participate in any part of the implementation of the method body.

Test results

Again, the results are presented in milliseconds, on 10 million operations. I do not want to run them on 100 million operations, because it will be very slow. Also I will clarify that testing was performed on x64 JIT

TestTestCase [int]TestCase [string]
Lambda expression18029684
Generic cache class90288
Generic method group conversion18430017
Non-generic method group conversion178189
Static method on generic type18029276
Instance method on generic type202299

Yes, creating a delegate for a generic method, with a reference type as a generic parameter, is 150 times slower than for a value-type as a generic parameter. And it seems that I was the first to know about it. It would, of course, be very interesting to hear the answer from the CLR team blog ...

findings


I would never have found this reef if only I had any tests. The lesson to be learned from this message is to never use the new() restriction if your goal is application performance and if your code relies on a large number of operations to allocate new objects in Generic types.

One of the most difficult questions on which it is difficult to know the exact answer is what the compiler will do with a lambda expression. In our version, the compiler doesn’t care much about performance, and we’ll have to take care of it ourselves.
image

Source: https://habr.com/ru/post/144193/


All Articles