ref locals and ref returns to C #: performance pitfalls

In C #, passing arguments by value or by reference was supported from the start. But before version 7, the C # compiler supported only one way to return a value from a method (or property) - return by value. In C # 7, the situation has changed with the introduction of two new features: ref returns and ref locals. More about them and their performance - under the cut.

The reasons

There are many differences between arrays and other collections in terms of the CLR. The CLR supported arrays from the start, and they can be viewed as built-in functionality. The CLR and JIT compiler are able to work with arrays, and they also have another feature: an array indexer returns the elements by reference, not by value.
')
To demonstrate this, we will have to turn to the forbidden method — use a mutable type of value:

public struct Mutable { private int _x; public Mutable(int x) => _x = x; public int X => _x; public void IncrementX() { _x++; } } [Test] public void CheckMutability() { var ma = new[] {new Mutable(1)}; ma[0].IncrementX(); // X has been changed! Assert.That(ma[0].X, Is.EqualTo(2)); var ml = new List<Mutable> {new Mutable(1)}; ml[0].IncrementX(); // X hasn't been changed! Assert.That(ml[0].X, Is.EqualTo(1)); }

Testing is successful because the array indexer is significantly different from the List indexer.

The C # compiler gives a special instruction to the array indexer - ldelema, which returns a managed reference to the element of the given array. In essence, an array indexer returns an element by reference. However, the List cannot behave in the same way, because in C # it was impossible to * return the alias of the internal state. Therefore, the List indexer returns an item by value, that is, returns a copy of the item.

* As we will see soon, the List indexer still cannot return an item by reference.

This means that ma [0] .IncrementX () calls the method that changes the first element of the array, while ml [0] .IncrementX () calls the method that changes the copy of the element without affecting the original list.

Returned Reference Values and Reference Local Variables: Foundations

The meaning of these functions is very simple: declaring a returned reference value allows you to return an alias for an existing variable, and a reference local variable can store such an alias.

1. A simple example:

 [Test] public void RefLocalsAndRefReturnsBasics() { int[] array = { 1, 2 }; // Capture an alias to the first element into a local ref int first = ref array[0]; first = 42; Assert.That(array[0], Is.EqualTo(42)); // Local function that returns the first element by ref ref int GetByRef(int[] a) => ref a[0]; // Weird syntax: the result of a function call is assignable GetByRef(array) = -1; Assert.That(array[0], Is.EqualTo(-1)); }

2. Returned reference values and readonly modifier

The returned reference value can return an alias of the instance field, and starting with C # version 7.2, you can return the alias without the ability to write to the corresponding object using the ref modifier readonly:

 class EncapsulationWentWrong { private readonly Guid _guid; private int _x; public EncapsulationWentWrong(int x) => _x = x; // Return an alias to the private field. No encapsulation any more. public ref int X => ref _x; // Return a readonly alias to the private field. public ref readonly Guid Guid => ref _guid; } [Test] public void NoEncapsulation() { var instance = new EncapsulationWentWrong(42); instance.X++; Assert.That(instance.X, Is.EqualTo(43)); // Cannot assign to property 'EncapsulationWentWrong.Guid' because it is a readonly variable // instance.Guid = Guid.Empty; }

Methods and properties can return an “alias” for the internal state. For the property in this case, the job method should not be defined.
Returning a link breaks the encapsulation, because the client gets full control over the internal state of the object.
Returning with a read-only link avoids unnecessary copying of value types, while not allowing the client to change the internal state.
Read-only links can be used for reference types, although this does not make much sense in unusual cases.

3. Existing restrictions. Returning an alias can be dangerous: using an alias on a stack variable after the method completes will cause the application to crash. To make this function safe, the C # compiler applies various restrictions:

Unable to return a reference to a local variable.
Unable to return a reference to this in structures.
You can return a reference to a variable located on the heap (for example, to a member of a class).
You can return a link to the ref / out parameters.

For more information, we recommend that you read the excellent publication Safe to return rules for ref returns returns . The author, Vladimir Sadov, is the creator of the return reference value function for the C # compiler.

Now that we have a general idea of the returned reference values and reference local variables, let's look at how you can use them.

Using returned reference values in indexers

To test the performance impact of these functions, we will create a unique immutable collection called NaiveImmutableList <T> and compare it with T [] and List for structures of different sizes (4, 16, 32 and 48).

 public class NaiveImmutableList<T> { private readonly int _length; private readonly T[] _data; public NaiveImmutableList(params T[] data) => (_data, _length) = (data, data.Length); public ref readonly T this[int idx] // R# 2017.3.2 is completely confused with this syntax! // => ref (idx >= _length ? ref Throw() : ref _data[idx]); { get { // Extracting 'throw' statement into a different // method helps the jitter to inline a property access. if ((uint)idx >= (uint)_length) ThrowIndexOutOfRangeException(); return ref _data[idx]; } } private static void ThrowIndexOutOfRangeException() => throw new IndexOutOfRangeException(); } struct LargeStruct_48 { public int N { get; } private readonly long l1, l2, l3, l4, l5; public LargeStruct_48(int n) : this() => N = n; } // Other structs like LargeStruct_16, LargeStruct_32 etc

A performance test is performed for all collections and adds all the values of the N properties for each element:

 private const int elementsCount = 100_000; private static LargeStruct_48[] CreateArray_48() => Enumerable.Range(1, elementsCount).Select(v => new LargeStruct_48(v)).ToArray(); private readonly LargeStruct_48[] _array48 = CreateArray_48(); [BenchmarkCategory("BigStruct_48")] [Benchmark(Baseline = true)] public int TestArray_48() { int result = 0; // Using elementsCound but not array.Length to force the bounds check // on each iteration. for (int i = 0; i < elementsCount; i++) { result = _array48[i].N; } return result; }

The results are as follows:

Apparently, something is wrong! The performance of our NaiveImmutableList <T> collection is the same as that of the List. What happened?

Readonly returned reference values: how it works

As you can see, the NaiveImmutableList <T> indexer returns a read-only link with the ref modifier readonly. This is fully justified, since we want to limit the ability of customers to change the basic state of the immutable collection. However, the structures we use in the performance test are not only readable.

This test will help us understand the basic behavior:

 [Test] public void CheckMutabilityForNaiveImmutableList() { var ml = new NaiveImmutableList<Mutable>(new Mutable(1)); ml[0].IncrementX(); // X has been changed, right? Assert.That(ml[0].X, Is.EqualTo(2)); }

The test failed! But why? Because the “read-only links” structure is similar to the structure of in modifiers and readonly fields with respect to structures: the compiler generates a defensive copy each time a structure element is used. This means ml [0]. still creates a copy of the first element, but it does not index: a copy is created at the call point.

This behavior actually makes sense. The C # compiler supports passing arguments by value, by reference, and by “read-only link” using the in modifier (for details, see The in-modifier and the readonly structs in C #) (“ In-modifier and read-only structures in C # ")). Now the compiler supports three different ways of returning a value from a method: by value, by reference, and by reference only for reading.

"Read-only links" are so similar to regular ones that the compiler uses the same InAttribute to distinguish between their return values:

 private int _n; public ref readonly int ByReadonlyRef() => ref _n;

In this case, the ByReadonlyRef method is effectively compiled into:

 [InAttribute] [return: IsReadOnly] public int* ByReadonlyRef() { return ref this._n; }

The similarity between the in modifier and the read-only link means that these functions are not very suitable for ordinary structures and can cause performance problems. Consider an example:

 public struct BigStruct { // Other fields public int X { get; } public int Y { get; } } private BigStruct _bigStruct; public ref readonly BigStruct GetBigStructByRef() => ref _bigStruct; ref readonly var bigStruct = ref GetBigStructByRef(); int result = bigStruct.X + bigStruct.Y;

In addition to the unusual syntax when declaring a variable for bigStruct, the code looks fine. The goal is clear: BigStruct is returned by reference for performance reasons. Unfortunately, since the BigStruct structure is writable, a protective copy is created each time an item is accessed.

Use returned reference values in indexers. Attempt number 2

Let's try the same test suite for read-only structures of different sizes:

Now the results make a lot more sense. Processing time is still increasing for large structures, but this is expected, because processing more than 100 thousand larger structures takes longer. But now the runtime for NaiveimmutableList <T> is very close to the time T [] and is significantly better than in the case of List.

Conclusion

Returned reference values should be handled carefully because they can break the encapsulation.
Readonly returned reference values are effective only for read-only structures. In the case of conventional structures, performance problems may occur.
When working with writable structures, the returned reference values with the readonly modifier create a defensive copy each time the variable is used, which can cause performance problems.

Returned reference values and local reference variables are useful functions for library creators and infrastructure code developers. However, it is very dangerous to use them in library code: to use a collection that effectively returns elements using a read-only link, each library user should remember: a read-only link to a recordable structure creates a defensive copy “at the call point ". At best, this will negate the possible increase in productivity, and at worst, it will lead to a serious deterioration if a large number of requests are made to a single reference local read-only variable.

PS Read-only links will appear in the BCL. The readonly ref methods for accessing items of immutable collections were presented in the following request for incorporating the changes made to corefx repo ( Implementing ItemRef API Proposal (“Suggestion for enabling ItemRef API”)). Therefore, it is very important that everyone understands the features of using these functions and how and when they should be used.

Source: https://habr.com/ru/post/423061/

All Articles

ref locals and ref returns to C #: performance pitfalls

The reasons

Returned Reference Values ​​and Reference Local Variables: Foundations

Using returned reference values ​​in indexers

Readonly returned reference values: how it works

Use returned reference values ​​in indexers. Attempt number 2

Conclusion

More articles:

Returned Reference Values and Reference Local Variables: Foundations

Using returned reference values in indexers

Use returned reference values in indexers. Attempt number 2