Duck typing or “is old foreach so simple?”

I think that many developers know that the foreach loop in C # is not as simple as it seems at first glance. To begin with, let's answer the question: “And what is needed for the foreach construct to successfully compile?”. The intuitive answer to this question seems to be something like: "Implementing a class of an IEnumerable or IEnumerable < T > interface by a class." However, it is not, well, or not quite.

The full answer to this question is: “In order for the foreach construct to be successfully compiled it is necessary that the object has a GetEnumerator () method, which returns an object with the MoveNext () method and the Current property, and if there is no such method, then we will look for IEnumerable interfaces and IEnumerable < T > .

The reasons for this "duck" behavior are two.

')
Let's remember the old days of the C # 1.0 language, when the language was simple and clear, and there were no generics, LINQs, or other closures. But since there were no generics, the “generalization” and reuse was based on polymorphism and object type, which, in fact, was done in the classes of collections and their iterators.

These most iterators were a pair of IEnumerable and IEnumerator interfaces, while the latter returned an object in the Current property. And if so, then using the IEnumerator interface to iterate through a strongly typed collection of significant types would result in packing and unpacking this value at each iteration, which, you see, can be quite expensive when it comes to such a common operation as enumeration of elements.

To solve this problem, it was decided to use the hack with duck typing, and to score a little on the principles of OOP in favor of performance. In this case, the class could explicitly implement the IEnumerable interface and provide an additional method GetEnumerator () that would return a strongly typed enumerator, whose Current property would return a specific type, for example, DateTime without any packaging.

OK. We dealt with the dinosaurs, but what about the real world? After all, in the courtyard, yet not a stone age, SOMs have already given oak, Don Box is no longer writing books, and our door is already full of geeks, imposing all sorts of functional goodies on us. Are there any benefits from such behavior now?

You might think that after the appearance of generic versions of the IEnumerable < T > and IEnumerator < T > interfaces, the duck-typing trick is no longer needed, but this is not quite true. If you look closely at the collection classes, such as List < T > , you can notice that this class (like all other collections in BCL) implements the IEnumerable < T > interface explicitly (explicitely), while providing an additional method GetEnumerator () :

// ! public class List<T> : IEnumerable<T> { //   List<T> //  ,  !!! public struct Enumerator : IEnumerator<T>, IDisposable { } public List<T>.Enumerator GetEnumerator() { return new Enumerator(this); } //    IEnumerator<T> IEnumerator<T>.GetEnumerator() { return GetEnumerator(); } }

Yes everything is correct. The GetEnumerator () method returns an instance of an iterator that is a variable structure (after all, the iterator contains a "pointer" to the current list item). And the changing significant types, according to many, are the sharpest saw on the .NET platform, capable of crippling a leg even for very experienced developers.

NOTE
Yes Yes. I know that I already buzzed all ears with mutable enumerators in general and mutable significant types in particular , but here we are, among other things, trying to find an explanation of the reasons for this behavior. So be patient a little more :)

The fact is that using the structure as an iterator in conjunction with the “duck” nature of the foreach loop prevents you from allocating memory on the heap when using this construct:

 var list = new List<int> {1, 2, 3}; //  List<T>.Enumerator GetEnumerator foreach(var i in list) {} //  IEnumerable<T> GetEnumerator foreach(var i in (IEnumerable<int>)list) {}

In the first example, due to the "duck" nature, the GetEnumerator () method of the List class is called, which returns an object of a significant type that will quietly live on the stack without any additional allocations of memory in the managed heap. In the second case, we cast the list variable to the interface, which will result in calling the interface method and, accordingly, the iterator package. Yes, the developers of the C # language put polymorphism and a number of other OOP principles just for the sake of efficiency.

var list = new List {1, 2, 3};

 var x1 = new { Items = ((IEnumerable<int>)list).GetEnumerator() }; while (x1.Items.MoveNext()) { Console.WriteLine(x1.Items.Current); } Console.ReadLine(); var x2 = new { Items = list.GetEnumerator() }; while (x2.Items.MoveNext()) { Console.WriteLine(x2.Items.Current); }

It is for this reason that the first while loop will print the expected 1, 2, 3, and the second while loop ... well, check for yourself.

Such a solution (using a variable structure) seems beyond micro-optimization, but you should not forget that foreach cycles can be nested, and that not all work on multi-core processors with gigabytes of memory. Before making such a decision, the BCL team conducted serious research that showed that the use of structures is really worth it.

NOTE
Just do not immediately use this example when implementing your own iterators or other auxiliary classes. The use of structures is an optimization in itself, the use of variable structures is a very serious decision, so you should be very clear on what benefits you get that you are ready to sacrifice security so much.

A small addition: why do I need to call Dispose ?

Another feature of foreach loop implementation is that it calls the Dispose method of the iterator. Below is a simplified version of the code generated by the compiler when iterating through the list variable in a foreach loop :

 { var enumerator = list.GetEnumerator(); try { while(enumerator.MoveNext()) { int current = enumerator.Current; Console.WriteLine(current); } } finally { enumerator.Dispose(); } }

There may be a reasonable question about where the iterator can have managed resources? Well, yes, when sorting a collection in memory, it’s really not from where, but you shouldn’t forget that enumerators in C # can be used not only as iterators for collections in memory; No one bothers us to make an iterator that returns the contents of the file line by line:

 public static class FileEx { public static IEnumerable<string> ReadByLine(string path) { if (path == null) throw new ArgumentNullException("path"); return ReadByLineImpl(path); } private static IEnumerable<string> ReadByLineImpl(string path) { using (var sr = new StreamReader(path)) { string s; while ((s = sr.ReadLine()) != null) yield return s; } } } foreach(var line in FileEx.ReadByLine("D:\\1.txt")) { Console.WriteLine(line); }

So, we have the ReadByLine method in which we open the file, which is definitely a resource and closes ... when? Obviously not every time the control leaves the ReadByLineImpl method, because then I will close it as many times as there are lines in this file.
In fact, the file will be closed once, just when you call the Dispose method of the iterator, which occurs in the finally block of the foreach loop . This is one of those rare cases on the .NET platform, where the finally block is not automatically called, but called exclusively by handles. So if you suddenly iterate over a certain sequence manually, you should not forget that the iterator can still contain resources, and it would be very nice to clear them with the help of an explicit call to the Dispose method of the iterator.

NOTE
You can read more about iterators in C # in the article ... Iterators in C #.

ZY And who can immediately answer this question: why do I need two methods ReadByLine and ReadByLineImpl , why don't I use only one method?

ZYY.Y. By the way, the foreach block is not the only example of duck typing in C #, and how many more examples can you remember?

Source: https://habr.com/ru/post/148905/

All Articles

Duck typing or “is old foreach so simple?”

A small addition: why do I need to call Dispose ?

More articles: