⬆️ ⬇️

Object to iterate, iterator and generator

Hello, dear readers Habrahabra. In this article we will try to figure out what an iterable object is, an iterator and a generator. Consider how they are implemented and used. The examples are written in Python, but the iterators and generators, in my opinion, are fundamental concepts that were relevant 20 years ago and are even more relevant now, while at the same time they have not actually changed.







Iterators



To begin, recall what the Iterator pattern is.

Purpose:





As a result, we get a division of responsibility: customers get the opportunity to work with different collections in a unified way, and the collections become easier due to the fact that they delegate the enumeration of their elements to another entity.

')

There are two types of iterators, external and internal .

An external iterator is a classic (pull-based) iterator, when the crawl process is explicitly controlled by the client by calling the Next method.

The internal iterator is the push-based iterator to which the callback function is passed, and it itself notifies the client of the receipt of the next element.



The classic diagram of the “Iterator” pattern, as described in a not-so-well-known book of the “gang of four”:

image



Aggregate is a compound object that an iterator can navigate;

Iterator - defines an iterator interface;

ConcreteAggregate - specific implementation of the unit;

ConcreteIterator - a specific implementation of an iterator for a specific aggregate;

Client - uses an Aggregate object and an iterator to bypass it.



We try to implement a classic iterator in Python



Abstract classes:



class Aggregate(abc.ABC): @abc.abstractmethod def iterator(self): """   """ pass class Iterator(abc.ABC): def __init__(self, collection, cursor): self._collection = collection self._cursor = cursor @abc.abstractmethod def first(self): """     .    reset """ pass @abc.abstractmethod def next(self): """     .   StopIteration,    . """ pass @abc.abstractmethod def current(self): """    """ pass 


Specific iterator implementation for the list:



 class ListIterator(Iterator): def __init__(self, collection, cursor): """ :param collection:  :param cursor:      .      -1 >= cursor < len(collection) """ super().__init__(collection, cursor) def first(self): """    -1.         next     1. """ self._cursor = -1 def next(self): """      ,   StopIteration,     1 """ if self._cursor + 1 >= len(self._collection): raise StopIteration() self._cursor += 1 def current(self): """    """ return self._collection[self._cursor] 


Concrete implementation of the unit:



 class ListCollection(Aggregate): def __init__(self, collection): self._collection = list(collection) def iterator(self): return ListIterator(self._collection, -1) 


Now we can create a collection object and bypass all its elements using an iterator:



 collection = (1, 2, 5, 6, 8) aggregate = ListCollection(collection) itr = aggregate.iterator() #   while True: try: itr.next() except StopIteration: break print(itr.current()) 


And since we implemented the first method, which resets the iterator to the initial state, we can use the same iterator again:



 #      itr.first() while True: try: itr.next() except StopIteration: break print(itr.current()) 


Implementations can be different, but the basic idea is that an iterator can bypass various structures, vectors, trees, hash tables, and much more, while having the same interface from the outside.



Python iteration protocol



In the book of the "gang of four" about the implementation of the iterator is written:



The iterator class minimum interface consists of the First, Next, IsDone, and CurrentItem operations. But if you really want, then this interface can be simplified by combining operations Next, IsDone and CurrentItem into one, which will go to the next object and return it. If the traversal is completed, then this operation will return a special value (for example, 0) indicating the end of the iteration.



This is exactly how it is implemented in Python, but instead of a special meaning, StopIteration tells about the end of the iteration. It's easier to ask for forgiveness than permission.



First, it is important to define the terms.



Consider the object to be iterated . In the standard library, it is declared as an abstract class collections.abc.Iterable:



 class Iterable(metaclass=ABCMeta): __slots__ = () @abstractmethod def __iter__(self): while False: yield None @classmethod def __subclasshook__(cls, C): if cls is Iterable: return _check_methods(C, "__iter__") return NotImplemented 


It has an abstract __iter__ method that should return an iterator object. And the __subclasshook__ method that checks if the class has a __iter__ method. Thus, it turns out that the object being iterated is any object that implements the __iter__ method.



 class SomeIterable1(collections.abc.Iterable): def __iter__(self): pass class SomeIterable2: def __iter__(self): pass print(isinstance(SomeIterable1(), collections.abc.Iterable)) # True print(isinstance(SomeIterable2(), collections.abc.Iterable)) # True 


But there is one thing, this is the iter () function. This function uses, for example, a for loop to get an iterator. The iter () function, primarily to get an iterator from an object, calls its __iter__ method. If the method is not implemented, it checks the presence of the __getitem__ method, and if it is implemented, an iterator is created based on it. __getitem__ should take an index from scratch. If none of these methods are implemented, then a TypeError exception will be thrown.



 from string import ascii_letters class SomeIterable3: def __getitem__(self, key): return ascii_letters[key] for item in SomeIterable3(): print(item) 


Total, an iterated object is any object from which the built-in function iter () can get an iterator. Sequences (abc.Sequence) are always iterable, because they implement the __getitem__ method



Now let's see what happens with the iterators in Python. They are represented by the abstract collections.abc.Iterator class:



 class Iterator(Iterable): __slots__ = () @abstractmethod def __next__(self): 'Return the next item from the iterator. When exhausted, raise StopIteration' raise StopIteration def __iter__(self): return self @classmethod def __subclasshook__(cls, C): if cls is Iterator: return _check_methods(C, '__iter__', '__next__') return NotImplemented 


__next__ Returns the next available element and causes a StopIteration exception when there are no elements left.

__iter__ Returns self. This allows you to use an iterator where an iterative object is expected, for example, for.

__subclasshook__ Checks if the class has a __iter__ and __next__ method



So, an iterator in python is any object that implements the __next__ method with no arguments, which should return the next element or StopIteration error. It also implements the __iter__ method and therefore is itself an iterable object .



Thus, it is possible to implement an iterable object based on a list and its iterator:



 class ListIterator(collections.abc.Iterator): def __init__(self, collection, cursor): self._collection = collection self._cursor = cursor def __next__(self): if self._cursor + 1 >= len(self._collection): raise StopIteration() self._cursor += 1 return self._collection[self._cursor] class ListCollection(collections.abc.Iterable): def __init__(self, collection): self._collection = collection def __iter__(self): return ListIterator(self._collection, -1) 


Variants of work:



 collection = [1, 2, 5, 6, 8] aggregate = ListCollection(collection) for item in aggregate: print(item) print("*" * 50) itr = iter(aggregate) while True: try: print(next(itr)) except StopIteration: break 


The next () function calls the __next__ method. It can pass the second argument that it will return at the end of the iteration instead of the StopIteration error.



 itr = iter(aggregate) while True: item = next(itr, None) if item is None: break print(item) 


Before proceeding to the generators, consider another feature of the built-in function iter () . You can call it with two arguments, which allows you to create an iterator from the called object (a function or class with the implemented __call__ method). The first argument must be a callable object, and the second is a kind of delimiter. The called object is called at each iteration and the iteration is completed when StopIteration exception is raised or the limiter value is returned.



For example, from a function that randomly returns 1-6, an iterator can be made that will return values ​​until it “drops out” 6:



 from random import randint def d6(): return randint(1, 6) for roll in iter(d6, 6): print(roll) 


Other examples
The small class ProgrammingLanguages, which has a tuple with programming languages, the constructor takes the initial values ​​of the index by the name of the language and the function __call__ which iterates over the tuple.



 class ProgrammingLanguages: _name = ("Python", "Golang", "C#", "C", "C++", "Java", "SQL", "JS") def __init__(self, first=None): self.index = (-1 if first is None else ProgrammingLanguages._name.index(first) - 1) def __call__(self): self.index += 1 if self.index < len(ProgrammingLanguages._name): return ProgrammingLanguages._name[self.index] raise StopIteration 


We can iterate through all languages ​​starting with C # and to the last:



 for lang in iter(ProgrammingLanguages("C#"), None): print(lang) 


C first to C:



 pl = ProgrammingLanguages() for lang in iter(pl, "C"): print(lang) 


One more example:



 #      with open('mydata.txt') as fp: for line in iter(fp.readline, ''): print(line) 




Generators



From the point of view of implementation, a generator in Python is a language construct that can be implemented in two ways: as a function with the yield keyword or as a generator expression. As a result of calling a function or evaluating an expression, we get the generator object of the types type. GeneratorType .



In the generator object, the methods __next__ and __iter__ are defined, that is, the iterator protocol is implemented, from this point of view, in Python any generator is an iterator.

Conceptually, an iterator is a mechanism for crawling data by element, and the generator allows you to postpone creating a result during iteration. A generator can create a result based on an algorithm or take elements from a data source (collection, files, network connections, etc.) and modify them.



A vivid example is the range and enumerate functions:



range generates a limited arithmetic progression of integers without using any data source.

enumerate generates two-element tuples with an index and one element from the object being iterated.



Yield



To begin with, we will write a simple generator without using a generator object. This is the Fibonacci number generator:



 class FibonacciGenerator: def __init__(self): self.prev = 0 self.cur = 1 def __next__(self): result = self.prev self.prev, self.cur = self.cur, self.prev + self.cur return result def __iter__(self): return self for i in FibonacciGenerator(): print(i) if i > 100: break 


But using the yield keyword, you can greatly simplify the implementation:



 def fibonacci(): prev, cur = 0, 1 while True: yield prev prev, cur = cur, prev + cur for i in fibonacci(): print(i) if i > 100: break 


Any function in Python that has the yield keyword in its body is called a generator function — it returns a generator object when called.

The generator object implements the iterator interface; accordingly, this object can be operated as with any other iterated object .



 fib = fibonacci() print(next(fib)) # 0 print(next(fib)) # 1 for num, fib in enumerate(fibonacci()): print('{0}: {1}'.format(num, fib)) if num > 9: break # 0: 0 # 1: 1 # 2: 1 ... 


Consider the work yield:



 def gen_fun(): print('block 1') yield 1 print('block 2') yield 2 print('end') for i in gen_fun(): print(i) # block 1 # 1 # block 2 # 2 # end 


  1. when the gen_fun function is called, a generator object is created
  2. for calls iter () with this object and gets an iterator of this generator.
  3. in the loop, it calls the next () function with this iterator until a StopIteration exception is received
  4. at each next call, the execution in the function starts from the place where it was completed the last time and continues until the next yield


Approximately the following occurs. The generator function is divided into parts:



 def gen_fun_1(): print('block 1') return 1 def gen_fun_2(): print('block 2') return 2 def gen_fun_3(): print('end') def gen_fun_end(): raise StopIteration 


A state machine is created in which every time you call __next__, the state changes and, depending on it, this or that piece of code is called. If, in a function, yield in a cycle, then the state machine state accordingly loops until the condition is met.



Own option range:



 def cool_range(start, stop, inc): x = start while x < stop: yield x x += inc for n in cool_range(1, 5, 0.5): print(n) # 1 # 1.5 # ... # 4.5 print(list(cool_range(0, 2, 0.5))) # [0, 0.5, 1.0, 1.5] 


Generator expression



In short, a syntactically shorter way to create a generator without defining or calling a function. And since this expression, then it has a number of limitations. Basically, it is convenient to use for generating collections, their simple transformations and applying conditions to them.



In programming languages, there are such things as lazy / lazy evaluation and eager / greedy evaluation . Generators can be considered a deferred calculation, in this sense, the list inclusion (list comprehension) is very similar to the generator expression, but they are different approaches.



 (i for i in range(10000000)) [i for i in range(10000000)] 


The first option works in a manner similar to our cool_range function and can generate any range without problems. But the second option will create a whole list at once, with all the problems arising from it.



Yield from



To circumvent restricted nested structures, the traditional approach is to use nested loops. The same approach can be used when the generator function must give values ​​generated by another generator.



Function similar to itertools.chain:



 def chain(*iterables): for it in iterables: for i in it: yield i g = chain([1, 2, 3], {'A', 'B', 'C'}, '...') print(list(g)) # [1, 2, 3, 'A', 'B', 'C', '.', '.', '.'] 


But nested loops can be removed by adding the yield from construct:



 def chain(*iterables): for it in iterables: yield from it g = chain([1, 2, 3], {'A', 'B', 'C'}, '...') print(list(g)) # [1, 2, 3, 'A', 'B', 'C', '.', '.', '.'] 


The main benefit is the yield from creating a direct channel between the internal generator and the customer of the external generator. But this is more about coroutines , which deserve a separate article. You can also discuss generator methods here: close (), throw () and send ().



And finally, another example. The function accepts an object to be iterated, with any level of nesting by other objects to be iterated, and forming a flat sequence:



 from collections import Iterable def flatten(items, ignore_types=(str, bytes)): """ str, bytes -   ,      """ for x in items: if isinstance(x, Iterable) and not isinstance(x, ignore_types): yield from flatten(x) else: yield x items = [1, 2, [3, 4, [5, 6], 7], 8, ('A', {'B', 'C'})] for x in flatten(items): print(x) # 1 # 2 # 3 # 4 # 5 # 6 # 7 # 8 # A # C # B 


That's all for today. Thank!

Source: https://habr.com/ru/post/337314/



All Articles