In this article, John Skeet will describe how the simplest language constructs slow down your program and how you can speed it up.As in any job that is linked to application performance, the result may vary depending on conditions (in particular, for example, a 64-bit
JIT
may work a little differently), and in most cases this should not worry you. Despite this, a relatively small number of developers write a production code consisting of a large number of micro-optimizations. Therefore, please do not accept this post as a call to complicate the code for the sake of irrational optimization, which supposedly speeds up your program. Use it only where it may actually be needed.
New () constraint
Suppose, narpimer, we have the type
SteppedPattern
(the author discusses optimization using the example of his library,
Noda Time
, approx. Transl.), Which has a generic type
TBucket
. I note only that it is important that before I parse the value, I want to create a new object of the
TBucket
class. The idea is that the nuggets of information are added to the
Bucket
, where they parse. And after the end of the operation, they add up to
ParseResult
. So each string parsing operation requires the creation of a
TBucket
instance. How can we create them in the case of Generic types?
')
We can do this by calling the type constructor without parameters. I don’t want to think about whether the passed types have such a constructor, so I’ll just add a
new()
constraint and call
new TBucket()
.
Sumptuously! Quite simple. However, unfortunately, I lost sight of the fact that this single line of code will take us 75% of the time it takes to parse a line. And this is just the creation of an empty
Bucket
- the simplest class that parses the simplest line! When I understood it, it shocked me.
Fix it using the provider
Our fix will be very simple. We just need to tell our type how to create an instance of the object. We do this with a delegate:
Now I can call
new StoppedPattern(() => new OffsetBucket())
, or something like that. It also means that I can leave the costructor as internal and never take care of it again. And, which will further simplify the writing of subsequent code, I could even use the old Bucket to parse subsequent lines.
I want signs!
It seems to me that not everyone wants to run the tests on their own, and more want to look at the finished results. So I decided to give the results of the benchmarks, which I did to check only the time for creating Generic types. In order to show how insignificant these results will be, I will indicate that the values recorded in the table are measured in milliseconds. And during this time 100 million operations have been performed, which we will test. Therefore, unless your code is based on a frequent reference to the operation of creating generic types, this should not make you want to rewrite the code. However, remember this for the future.
Anyway, our code is designed to work with four types: two classes and two structures. And for each of them - with a small and large version (meaning, apparently, small and large versions for the
GAC
, smaller and larger than 85K), on 32-bit and 64-bit machines, for
CLR v2
, v4. My 64-bit machine is faster by itself, so it’s necessary to compare the results inside one machine.
CLR v4: 32-bit results (ms per 100 million iterations)Test type | new () constraint | Provider delegate |
Small struct | 689 | 1225 |
Large struct | 11188 | 7273 |
Small class | 16307 | 1690 |
Large class | 17471 | 3017 |
CLR v4: 64-bit results (ms per 100 million iterations)Test type | new () constraint | Provider delegate |
Small struct | 473 | 868 |
Large struct | 2670 | 2396 |
Small class | 8366 | 1189 |
Large class | 8805 | 1529 |
CLR v2: 32-bit results (ms per 100 million iterations)Test type | new () constraint | Provider delegate |
Small struct | 703 | 1246 |
Large struct | 11411 | 7392 |
Small class | 143967 | 1791 |
Large class | 143107 | 2581 |
CLR v2: 64-bit results (ms per 100 million iterations)Test type | new () constraint | Provider delegate |
Small struct | 510 | 686 |
Large struct | 2334 | 1731 |
Small class | 81801 | 1539 |
Large class | 83293 | 1896 |
Look at the results for the classes. These are real results - they take about 2 minutes on my laptop when using the
new()
restriction and only a couple of seconds when using the provider. And, which is very important to note, these results are relevant for
.Net 2.0
(meaning
CLR
, and version 2.0 is most likely written to surprise the reader with the fact that, up to
.Net 3.5
everything works on
CLR v2
, for
.Net 2.0
).
And, of course, you can download the
benchmark to see and see how it works on your machine.
What happens under the hood?
As far as I understand, there is no
IL
instruction to support the
new()
constraint. Instead, the compiler inserts instructions for calling
Activator.CreateInstance [T] . Obviously, this is in any case slower than calling the delegate, since In this case, we are trying to find a suitable constructor for us through reflection and call it. I was really surprised that this was not optimized. After all, the obvious solution is to use delegates and cache them for future use. I will not raise debates on their decision, because in the end their decision does not consume additional memory that the cache will occupy.
I want more benchmarks !!
(taken from the second part of the article)
Here we look at the performance of work with delegates. And also try to speed them up.
You can download the full source code for performance testing
from my site . Here, as a matter of fact, I do similar actions every time I write a test. I create an
Action
delegate that does nothing and check that the reference to it is not null. I only do this to avoid
JIT
optimizations. Each test is implemented as a generic method that accepts one Generic parameter. I call each method two times: the first time I pass the
Int32
argument, and the second,
String
. Also included a few cases:
- I use lambda expressions: Action foo = () => ();
private static void Lambda<T>() { Action foo = () => {}; if (foo == null) { throw new Exception(); } }
- What I would like the compiler to do for me: a separate cache that stores a delegate for instantiating a class.
private static void FakeCachedLambda<T>() { if (FakeLambdaCache<T>.CachedAction == null) { FakeLambdaCache<T>.CachedAction = FakeLambdaCache<T>.NoOp; } Action foo = FakeLambdaCache<T>.CachedAction; if (foo == null) { throw new Exception(); } } private static class FakeLambdaCache<T> { internal static Action CachedAction; internal static void NoOp() {} }
- What the compiler does in reality with a lambda expression: we will write a separate generic method, and we will do a
method group conversion
private static void GenericMethodGroup<T>() { Action foo = NoOp<T>; if (foo == null) { throw new Exception(); } }
- What the compiler could do: use a separate non-
generic
method to subsequently apply method group conversion
private static void NonGenericMethodGroup<T>() { Action foo = NoOp; if (foo == null) { throw new Exception(); } }
- Using
method group conversion
in the static non- generic
method of the generic type;
private static void StaticMethodOnGenericType<T>() { Action foo = SampleGenericClass<T>.NoOpStatic; if (foo == null) { throw new Exception(); } }
- Using
method group conversion
in a non-static, non-generic method of a generic type, using a generic cache class with a single field pointing to an instance of the generic class.
Yes, the latter looks somewhat convoluted, however it looks much simpler:
private static void InstanceMethodOnGenericType<T>() { Action foo = ClassHolder<T>.SampleInstance.NoOpInstance; if (foo == null) { throw new Exception(); } }
Also I will reveal all unsolved definitions:
private static void NoOp() {} private static void NoOp<T>() {} private class ClassHolder<T> { internal static SampleGenericClass<T> SampleInstance = new SampleGenericClass<T>(); } private class SampleGenericClass<T> { internal static void NoOpStatic() { } internal void NoOpInstance() { } }
Notice that we do all this in a generic method, and call it for each type:
Int32
and
String
. And what is important to note is that we do not capture any variables, and the generic parameter does not participate in any part of the implementation of the method body.
Test results
Again, the results are presented in milliseconds, on 10 million operations. I do not want to run them on 100 million operations, because it will be very slow. Also I will clarify that testing was performed on x64 JIT
Test | TestCase [int] | TestCase [string] |
Lambda expression | 180 | 29684 |
Generic cache class | 90 | 288 |
Generic method group conversion | 184 | 30017 |
Non-generic method group conversion | 178 | 189 |
Static method on generic type | 180 | 29276 |
Instance method on generic type | 202 | 299 |
Yes, creating a delegate for a generic method, with a reference type as a generic parameter, is 150 times slower than for a value-type as a generic parameter. And it seems that I was the first to know about it. It would, of course, be very interesting to hear the answer from the
CLR
team blog ...
findings
I would never have found this reef if only I had any tests. The lesson to be learned from this message is to never use the
new()
restriction if your goal is application performance and if your code relies on a large number of operations to allocate new objects in Generic types.
One of the most difficult questions on which it is difficult to know the exact answer is what the compiler will do with a lambda expression. In our version, the compiler doesn’t care much about performance, and we’ll have to take care of it ourselves.
