I think that many developers know that the
foreach loop in C # is not as simple as it seems at first glance. To begin with, let's answer the question: “And what is needed for the
foreach construct to successfully compile?”. The intuitive answer to this question seems to be something like: "Implementing a class of an
IEnumerable or
IEnumerable < T > interface by a class." However, it is not, well, or not quite.
The full answer to this question is: “In order for the
foreach construct to be successfully compiled it is necessary that the object has a
GetEnumerator () method, which returns an object with the
MoveNext () method and the
Current property, and if there is no such method, then we will look for
IEnumerable interfaces and
IEnumerable < T > .
The reasons for this "duck" behavior are two.
')
Let's remember the old days of the C # 1.0 language, when the language was simple and clear, and there were no generics, LINQs, or other closures. But since there were no generics, the “generalization” and reuse was based on polymorphism and
object type, which, in fact, was done in the classes of collections and their iterators.
These most iterators were a pair of
IEnumerable and
IEnumerator interfaces, while the latter returned an
object in the
Current property. And if so, then using the
IEnumerator interface to iterate through a strongly typed collection of significant types would result in packing and unpacking this value at each iteration, which, you see, can be quite expensive when it comes to such a common operation as enumeration of elements.
To solve this problem, it was decided to use the hack with duck typing, and to score a little on the principles of OOP in favor of performance. In this case, the class could explicitly implement the
IEnumerable interface and provide an additional method
GetEnumerator () that would return a strongly typed enumerator, whose
Current property would return a specific type, for example,
DateTime without any packaging.
OK. We dealt with the dinosaurs, but what about the real world? After all, in the courtyard, yet not a stone age, SOMs have already given oak, Don Box is no longer writing books, and our door is already full of geeks, imposing all sorts of functional goodies on us. Are there any benefits from such behavior now?
You might think that after the appearance of generic versions of the
IEnumerable < T > and
IEnumerator < T > interfaces, the duck-typing trick is no longer needed, but this is not quite true. If you look closely at the collection classes, such as
List < T > , you can notice that this class (like all other collections in BCL) implements the
IEnumerable < T > interface explicitly (explicitely), while providing an additional method
GetEnumerator () :
Yes everything is correct. The
GetEnumerator () method returns an instance of an iterator that is a variable structure (after all, the iterator contains a "pointer" to the current list item). And the changing significant types, according to many, are the sharpest saw on the .NET platform, capable of crippling a leg even for very experienced developers.
NOTEYes Yes. I know that I already
buzzed all ears with mutable
enumerators in general and
mutable significant types in particular , but here we are, among other things, trying to find an explanation of the reasons for this behavior. So be patient a little more :)
The fact is that using the structure as an iterator in conjunction with the “duck” nature of the
foreach loop prevents you from allocating memory on the heap when using this construct:
var list = new List<int> {1, 2, 3};
In the first example, due to the "duck" nature, the
GetEnumerator () method of the
List class is called, which returns an object of a significant type that will quietly live on the stack without any additional allocations of memory in the managed heap. In the second case, we cast the
list variable to the interface, which will result in calling the interface method and, accordingly, the iterator package. Yes, the developers of the C # language put polymorphism and a number of other OOP principles just for the sake of efficiency.
var list = new List {1, 2, 3};
var x1 = new { Items = ((IEnumerable<int>)list).GetEnumerator() }; while (x1.Items.MoveNext()) { Console.WriteLine(x1.Items.Current); } Console.ReadLine(); var x2 = new { Items = list.GetEnumerator() }; while (x2.Items.MoveNext()) { Console.WriteLine(x2.Items.Current); }
It is for this reason that the first
while loop will print the expected 1, 2, 3, and the second
while loop ... well, check for yourself.
Such a solution (using a variable structure) seems beyond micro-optimization, but you should not forget that
foreach cycles can be nested, and that not all work on multi-core processors with gigabytes of memory. Before making such a decision, the BCL team conducted serious research that showed that the use of structures is really worth it.
NOTEJust do not immediately use this example when implementing your own iterators or other auxiliary classes. The use of structures is an optimization in itself, the use of variable structures is a very serious decision, so you should be very clear on what benefits you get that you are ready to sacrifice security so much.
A small addition: why do I need to call Dispose ?
Another feature of
foreach loop implementation is that it calls the
Dispose method of the iterator. Below is a simplified version of the code generated by the compiler when iterating through the
list variable in a
foreach loop :
{ var enumerator = list.GetEnumerator(); try { while(enumerator.MoveNext()) { int current = enumerator.Current; Console.WriteLine(current); } } finally { enumerator.Dispose(); } }
There may be a reasonable question about where the iterator can have managed resources? Well, yes, when sorting a collection in memory, it’s really not from where, but you shouldn’t forget that enumerators in C # can be used not only as iterators for collections in memory; No one bothers us to make an iterator that returns the contents of the file line by line:
public static class FileEx { public static IEnumerable<string> ReadByLine(string path) { if (path == null) throw new ArgumentNullException("path"); return ReadByLineImpl(path); } private static IEnumerable<string> ReadByLineImpl(string path) { using (var sr = new StreamReader(path)) { string s; while ((s = sr.ReadLine()) != null) yield return s; } } } foreach(var line in FileEx.ReadByLine("D:\\1.txt")) { Console.WriteLine(line); }
So, we have the
ReadByLine method in which we open the file, which is definitely a resource and closes ... when? Obviously not every time the control leaves the
ReadByLineImpl method, because then I will close it as many times as there are lines in this file.
In fact, the file will be closed once, just when you call the
Dispose method of the iterator, which occurs in the
finally block of the
foreach loop . This is one of those rare cases on the .NET platform, where the
finally block is not automatically called, but called exclusively by handles. So if you suddenly iterate over a certain sequence manually, you should not forget that the iterator can still contain resources, and it would be very nice to clear them with the help of an explicit call to the
Dispose method of the iterator.
NOTEYou can read more about iterators in C # in the article ...
Iterators in C #.ZY And who can immediately answer this question: why do I need two methods
ReadByLine and
ReadByLineImpl , why don't I use only one method?
ZYY.Y. By the way, the
foreach block is not the only example of duck typing in C #, and how many more examples can you remember?