Implementing Iterators in C # (Part 1)

From the translator:
Not so long ago, my less experienced colleague asked me why we used yield return in C #. I do not write my iterators very often, therefore, answering him, I doubted my words. Having done it in MSDN, I was strengthened in what was said, but I had a question: “But what is this instruction compiled into yet?” ". The article is old, but I think that it can be useful for a certain group of developers who are used to reading Russian-language articles and documents.

Continuing reference: implementing iterators in C # (part 2)

')
Like anonymous methods, iterators in C # are complex syntactic sugars. You can implement them completely yourself (after all, in earlier versions of C # you had to do it), but using the compiler is much more convenient. The idea behind iterators is that they take a function with yield return expressions (and, possibly, yield break expressions) and convert it to a state machine. When a yield return is called, the state of the function is preserved, and when the iterator is called again to get the next object, this state is restored. The main thing in iterators is that all local variables of the iterator (including the parameters of the iterator as pre-initialized local variables, including the hidden parameter this) become member variables (hereinafter referred to as fields) of the auxiliary class. In addition, the auxiliary class contains the state field, which keeps track of where the execution interrupt occurred and the current field, which stores the most recent of the objects already listed.

class MyClass { int limit = 0; public MyClass(int limit) { this.limit = limit; } public IEnumerable<int> CountFrom(int start) { for (int i = start; i <= limit; i++) yield return i; } }

The CountFrom method creates an enumerator of integers, which produces integers from start to limit inclusive in step 1. The compiler implicitly converts this enumerator into something like this:

 class MyClass_Enumerator : IEnumerable<int> { int state$0 = 0; //   int current$0; //   MyClass this$0; //   CountFrom int start; //   CountFrom int i; //    CountFrom public int Current { get { return current$0; } } public bool MoveNext() { switch (state$0) { case 0: goto resume$0; case 1: goto resume$1; case 2: return false; } resume$0:; for (i = start; i <= this$0.limit; i++) { current$0 = i; state$0 = 1; return true; resume$1:; } state$0 = 2; return false; } // ...  ,   ... } public IEnumerable<int> CountFrom(int start) { MyClass_Enumerator e = new MyClass_Enumerator(); e.this$0 = this; e.start = start; return e; }

The enumerating class is automatically generated by the compiler and, as promised, it contains fields for the state and the current object, plus one field for each local variable. The Current property simply returns the current object. All the real work takes place in the MoveNext method.

To generate the MoveNext method, the compiler takes the code you wrote and performs several transformations:

All variable references must be adjusted as the code has been transferred to the auxiliary class.
this becomes this $ 0, because inside the generated function, this points to the automatically generated class instead of the original one.
m becomes this $ 0.m if m is a member of the source class (non-static field, property, or method). In fact, this rule is unnecessarily combined with the previous one, since writing the name of a member of a class without the prefix m is just an abbreviation for this.m.
v becomes this.v if v is a parameter or local variable. This rule is also unnecessary, since the v entry is equivalent to this.v, but I’ll appeal explicitly so that you notice that the variable repository has changed.

In addition, the compiler has to deal with all yield return statements.
Each yield return x expression is converted to

 current$0 = x; state$0 = n; return true; resume$n:;

where n is an increasing number starting at 1.

In addition, there are yield break expressions.
Each yield break statement is converted to

 state$0 = n2; return false;

where n2 is a number, one greater than the highest number of all states used in yield return expressions. Remember that at the end of each function, a yield break is implied.

Finally, the compiler inserts a large state manager at the very beginning of the function.

 switch (state$0) { case 0: goto resume$0; case 1: goto resume$1; case 2: goto resume$2; // ... case n: goto resume$n; case n2: return false; }

One case expression is created for each state, plus the initial and final state n2.

Source: https://habr.com/ru/post/136828/

All Articles

Implementing Iterators in C # (Part 1)

More articles: