Yield: what, where and why

The .Net developer community stood still waiting for the release of C # 7.0 and the new features it brings. Each version of the language, which next year will be 15 years old, brought with it something new and useful. And although each feature is worth a separate mention, today I want to talk about the yield keyword. I noticed that novice developers (and not only) avoid using it. In this article I will try to bring the advantages and disadvantages, as well as highlight cases where the use of yield appropriate.

yield creates an iterator and allows us not to write a separate class when we implement IEnumerable . C # contains two expressions using yield : yield return <expression> and yield break . yield can be used in methods, operators, and properties. I will talk about methods, since yield works the same everywhere.

By applying yield return returns, we declare that this method returns an IEnumerable sequence, whose elements are the results of the expressions of each yield return . And with the return value, yield return transfers control to the caller and continues the execution of the method after the next item is requested. The values of variables inside the yield method are stored between queries. yield break in turn plays the role of a well-known break used inside loops. The example below will return a sequence of numbers from 0 to 10:

Getnumbers

 private static IEnumerable<int> GetNumbers() { var number = 0; while (true) { if (number > 10) yield break; yield return number++; } }

It is important to mention that the use of yield has several limitations that you need to be aware of. The Reset call on the iterator throws a NotSupportedException . We cannot use it in anonymous methods and methods containing unsafe code. Also, a yield return cannot be located in a try-catch , although nothing prevents you from placing it in the try section of a try-finally block. yield break can be located in the try section of both try-catch and try-finally . I will not give the reasons for such restrictions, as they are described in detail by Eric Lipert here and here .

Let's see what yield turns into after compilation. Each yield return method is a state machine that goes from one state to another as the iterator works. Below is a simple application that displays an infinite sequence of odd numbers to the console:

Sample program

 internal class Program { private static void Main() { foreach (var number in GetOddNumbers()) Console.WriteLine(number); } private static IEnumerable<int> GetOddNumbers() { var previous = 0; while (true) if (++previous%2 != 0) yield return previous; } }

The compiler will generate the following code:

Generated code

 internal class Program { private static void Main() { IEnumerator<int> enumerator = null; try { enumerator = GetOddNumbers().GetEnumerator(); while (enumerator.MoveNext()) Console.WriteLine(enumerator.Current); } finally { if (enumerator != null) enumerator.Dispose(); } } [IteratorStateMachine(typeof(CompilerGeneratedYield))] private static IEnumerable<int> GetOddNumbers() { return new CompilerGeneratedYield(-2); } [CompilerGenerated] private sealed class CompilerGeneratedYield : IEnumerable<int>, IEnumerable, IEnumerator<int>, IDisposable, IEnumerator { private readonly int _initialThreadId; private int _current; private int _previous; private int _state; [DebuggerHidden] public CompilerGeneratedYield(int state) { _state = state; _initialThreadId = Environment.CurrentManagedThreadId; } [DebuggerHidden] IEnumerator<int> IEnumerable<int>.GetEnumerator() { CompilerGeneratedYield getOddNumbers; if ((_state == -2) && (_initialThreadId == Environment.CurrentManagedThreadId)) { _state = 0; getOddNumbers = this; } else { getOddNumbers = new CompilerGeneratedYield(0); } return getOddNumbers; } [DebuggerHidden] IEnumerator IEnumerable.GetEnumerator() { return ((IEnumerable<int>)this).GetEnumerator(); } int IEnumerator<int>.Current { [DebuggerHidden] get { return _current; } } object IEnumerator.Current { [DebuggerHidden] get { return _current; } } [DebuggerHidden] void IDisposable.Dispose() { } bool IEnumerator.MoveNext() { switch (_state) { case 0: _state = -1; _previous = 0; break; case 1: _state = -1; break; default: return false; } int num; do { num = _previous + 1; _previous = num; } while (num%2 == 0); _current = _previous; _state = 1; return true; } [DebuggerHidden] void IEnumerator.Reset() { throw new NotSupportedException(); } } }

From the example, you can see that the body of the method with yield was replaced by the generated class. Local variables of a method turned into class fields. The class itself implements both IEnumerable and IEnumerator . The MoveNext method contains the logic of the replaced method with the only difference that it is represented as a state machine. Depending on the implementation of the original method, the generated class may additionally contain an implementation of the Dispose method.

Let's do two tests and measure performance and memory consumption. I’ll note right away that these tests are synthetic and are given only to demonstrate the work yield in comparison with the implementation "head on." Measurements will be done using BenchmarkDotNet with the BenchmarkDotNet.Diagnostics.Windows diagnostic module enabled. The first is to compare the speed of the method for obtaining a sequence of numbers (analogue of Enumerable.Range(start, count) ). In the first case there will be an implementation without an iterator, in the second with:

Test 1

 public int[] Array(int start, int count) { var numbers = new int[count]; for (var i = 0; i < count; ++i) numbers[i] = start + i; return numbers; } public int[] Iterator(int start, int count) { return IteratorInternal(start, count).ToArray(); } private IEnumerable<int> IteratorInternal(int start, int count) { for (var i = 0; i < count; ++i) yield return start + i; }

Method	Count	Start	Median	Stddev	Gen 0	Gen 1	Gen 2	Bytes Allocated / Op
Array	100	ten	91.19 ns	1.25 ns	385.01	-	-	169.18
Iterator	100	ten	1,173.26 ns	10.94 ns	1,593.00	-	-	700.37

As can be seen from the results, the Array implementation is an order of magnitude faster and consumes 4 times less memory. An iterator and a separate call ToArray did their job.

The second test will be more difficult. We will emulate data flow. We will first select entries with an odd key, and then with a key multiple of 3rd. As in the previous test, the first implementation will be without an iterator, the second with:

Test 2

 public List<Tuple<int, string>> List(int start, int count) { var odds = new List<Tuple<int, string>>(); foreach (var record in OddsArray(ReadFromDb(start, count))) if (record.Item1%3 == 0) odds.Add(record); return odds; } public List<Tuple<int, string>> Iterator(int start, int count) { return IteratorInternal(start, count).ToList(); } private IEnumerable<Tuple<int, string>> IteratorInternal(int start, int count) { foreach (var record in OddsIterator(ReadFromDb(start, count))) if (record.Item1%3 == 0) yield return record; } private IEnumerable<Tuple<int, string>> OddsIterator(IEnumerable<Tuple<int, string>> records) { foreach (var record in records) if (record.Item1%2 != 0) yield return record; } private List<Tuple<int, string>> OddsArray(IEnumerable<Tuple<int, string>> records) { var odds = new List<Tuple<int, string>>(); foreach (var record in records) if (record.Item1%2 != 0) odds.Add(record); return odds; } private IEnumerable<Tuple<int, string>> ReadFromDb(int start, int count) { for (var i = start; i < count; ++i) yield return new KeyValuePair<int, string>(start + i, RandomString()); } private static string RandomString() { return Guid.NewGuid().ToString("n"); }

Method	Count	Start	Median	Stddev	Gen 0	Gen 1	Gen 2	Bytes Allocated / Op
List	100	ten	43.14 us	0.14 us	279.04	-	-	4,444.14
Iterator	100	ten	43.22 us	0.76 us	231.00	-	-	3,760.96

In this case, the execution speed turned out to be the same, and the memory consumption of the yield was even lower. This is due to the fact that in the implementation with the iterator, the collection was calculated only once and we saved memory on the allocation of one List<Tuple<int, string>> .

Taking into account all the above and the above tests, we can make a brief conclusion: the main disadvantage of yield is the additional class iterator. If the sequence is finite and the caller does not perform complex manipulations on the elements, the iterator will be slower and will create an undesirable load on the GC. However, it is reasonable to use yield in cases of processing long sequences, when each calculation of a collection results in the allocation of large memory arrays. The lazy nature of yield avoids the computation of elements of a sequence that can be filtered out. This can drastically reduce memory consumption and reduce the load on the processor.

Source: https://habr.com/ru/post/311094/

All Articles

Yield: what, where and why

More articles: