Foreword
I was very fond of Python after reading Mark Lutz’s book, Learning Python. The language is very beautiful, it's nice to write and express your own ideas. A large number of interpreters and compilers, extensions, modules and frameworks indicate that the community is very active and the language is developing. In the process of learning the language, I had a lot of questions that I carefully googled and tried to understand every construction that I did not understand. About this we will talk with you in this article, the article is focused on the novice Python developer.
Little about terms
Perhaps I'll start with terms that often confuse beginner Python programmers.
List comprehensions or list generators return a list. I have always confused list generators and expressions — generators (but not expression generators!). Agree, the Russian sounds very similar. Expressions - generators are
generator expressions , special expressions that return an iterator, not a list. Let's compare:
')
f = (x for x in xrange(100))
These are two completely different designs. The first returns a generator (that is, an iterator), the second is a regular list.
Generators or generators are special functions that return an iterator. To get the generator, you need to return the function value through yield:
def prime(lst): for i in lst: if i % 2 == 0: yield i >>> f = prime([1,2,3,4,5,6,7]) >>> list(f) [2, 4, 6] >>> next(f) Traceback (most recent call last): File "<stdin>", line 1, in <module> StopIteration >>>
By the way, in Python 3.3 there was a new design yield from. The sharing of yield and for is used so often that the two designs decide to merge.
def generator_range(first, last): for i in xrange(first, last): yield i def generator_range(first, last): yield from range(first, last)
What are context managers and what are they for?
Context managers are special constructs that are blocks of code enclosed in a with statement. The with statement creates a block using the context manager protocol, which we will discuss later in this article. The simplest function using this protocol is the open () function. Every time we open a file, we need to close it in order to push the output to disk (in fact, Python calls the close () method automatically, but using it explicitly is a good tone). For example:
fp = open("./file.txt", "w") fp.write("Hello, World") fp.close()
In order not to call the close () method each time, we can use the context manager of the open () function, which automatically closes the file after exiting the block:
with open("./file.txt", "w") as fp: fp.write("Hello, World")
Here we don’t need to call the close method each time to push the data into the file. From this it follows that the context manager is used to perform any actions before entering the block and after exiting it. But the functionality of context managers does not end there. In many programming languages, destructors are used for such tasks. But in Python, if an object is used somewhere else, there is no guarantee that the destructor will be called, since the __del__ method is called only if all references to the object have been exhausted:
In [4]: class Hello: ...: def __del__(self): ...: print 'destructor' ...: In [5]: f = Hello() In [6]: c = Hello() In [7]: e = Hello() In [8]: del e destructor In [9]: del c destructor In [10]: c = f In [11]: e = f In [12]: del f
We will solve this problem through context managers:
In [1]: class Hello: ...: def __del__(self): ...: print u'' ...: def __enter__(self): ...: print u' ' ...: def __exit__(self, exp_type, exp_value, traceback): ...: print u' ' ...: In [2]: f = Hello() In [3]: c = f In [4]: e = f In [5]: d = f In [6]: del d In [7]: del e In [8]: del c In [9]: del f
Now let's try calling the context manager:
In [10]: with Hello(): ....: print u' ' ....:
We saw that there was a guaranteed exit from the block after the execution of our code.
Context manager protocol
We have already briefly reviewed the context manager's protocol by writing a small Hello class. Let's now look at the protocol in more detail. In order for an object to become a context manager in its class, it is necessary to include two methods: __enter__ and __exit__. The first method is performed before entering the block. The method can return the current instance of the class so that it can be accessed using the as statement.
The __exit__ method is executed after exiting the with block, and it contains three parameters — exp_type, exp_value, and exp_tr. The context manager can catch the exceptions that were raised in the with block. We can catch only the exceptions we need or suppress unnecessary ones.
class Open(object): def __init__(self, file, flag): self.file = file self.flag = flag def __enter__(self): try: self.fp = open(self.file, self.flag) except IOError: self.fp = open(self.file, "w") return self.fp def __exit__(self, exp_type, exp_value, exp_tr): """ IOError """ if exp_type is IOError: self.fp.close()
The variable exp_type contains the exception class that was raised, exp_value is the exception message. In the example, we close the file and suppress the IOError exception by returning True to the __exit__ method. We allow all other exceptions in the block. As soon as our code comes to an end and the block ends, the self.fp.close () method is called, regardless of which exception was raised. By the way, such exceptions as NameError, SyntaxError can be suppressed inside the with block, but this should not be done.
The context manager protocols are very easy to use, but for common tasks there is an even simpler way that comes with the standard python library. Next we look at the contextlib package.
Package contextlib
Creating contextual managers in the traditional way, that is, writing classes with __enter__ and __exit__ methods is not one of the complicated tasks. But for a trivial code, writing such classes takes more work. For these purposes, the contextmanager () decorator, which is part of the contextlib package, was invented. Using the contextmanager () decorator, we can make a context manager from a normal function:
import contextlib @contextlib.contextmanager def context(): print u' ' try: yield {} except RuntimeError, err: print 'error: ', err finally: print u' '
Check the code performance:
In [8]: with context() as fp: ...: print u'' ...:
Let's try to raise an exception inside the block.
In [14]: with context() as value: ....: raise RuntimeError, 'Error' ....: error: Error In [15]:
As you can see from the example, the implementation using classes is practically no different in functionality from the implementation using the contextmanager () decorator, but using the decorator greatly simplifies our code.
Another interesting example of using contextmanager () decorator:
import contextlib @contextlib.contextmanager def bold_text(): print '<b>' yield
Result:
<b>Hello, World</b>
Doesn't it look like blocks in ruby?
And finally, let's talk about nested contexts. Nested contexts allow you to manage multiple contexts simultaneously. For example:
import contextlib @contextlib.contextmanager def context(name): print u' %s' % (name) yield name
Result:
enter context first
context entry second
inside block first second
out of context second
out of context first
Similar code without using the nested function:
first, second = context('first'), context('second') with first as first: with second as second: print u' %s %s' % (first, second)
Although this code is similar to the previous one, in some situations it will not work as we would like. The context ('first') and context ('second') objects are called before entering the block, so we will not be able to catch the exceptions that were raised in these objects. Agree, the first option is much more compact and looks more beautiful. But in Python 2.7 and 3.1, the nested function is outdated and a new syntax for nested contexts has been added:
with context('first') as first, context('second') as second: print u' %s %s' % (first, second)
range and xrange in Python 2.7 and Python 3
It is known that Python 2.7 range returns a list. I think everyone will agree that storing large amounts of data in memory is impractical, so we use the xrange function, which returns an xrange object that behaves almost the same as the list, but does not store all of the issued items in memory. But I was a little surprised by the xrange behavior in Python 2.x when large values are passed to functions. Let's look at an example:
>>> f = xrange(1000000000000000000000) Traceback (most recent call last): File "<stdin>", line 1, in <module> OverflowError: Python int too large to convert to C long >>>
Python tells us that int is too long and cannot be converted to C long. It turns out that Python 2.x has restrictions on the integer, which we can be sure of by looking at the sys.maxsize constant:
>>> import sys >>> sys.maxsize 9223372036854775807 >>>
Here it is the maximum value of an integer:
>>> import sys >>> sys.maxsize+1 9223372036854775808L >>>
Python neatly converted our number to a long int. Don't be surprised if xrange in Python 2.x behaves differently for large values.
In Python 3.3, the integer can be infinitely large, let's check:
>>> import sys >>> sys.maxsize 9223372036854775807 >>> range(sys.maxsize+1) range(0, 9223372036854775808) >>>
Conversion to long int did not happen. Here is another example:
>>> import sys >>> sys.maxsize + 1 9223372036854775808 >>> f = sys.maxsize + 1 >>> type(f) <class 'int'> >>>
In Python 2.7
>>> import sys >>> type(sys.maxsize + 1) <type 'long'> >>>
Not obvious behavior of some constructions.
I think everyone will agree that the simplicity of the python lies not in the ease of its study, but in the simplicity of the language itself. Python is beautiful, flexible and you can write on it not only in object-oriented style, but also in a functional one. But about the behavior of some structures, which at first glance seem strange, you need to know. To begin, consider the first example.
>>> f = [[]] * 3 >>> f[0].append('a') >>> f[1].append('b') >>> f[2].append('c') >>>
What will be the result of this design? The untrained developer will report the result: [['a'], [b '], [c']]. But actually we get:
>>> print f [['a', 'b', 'c'], ['a', 'b', 'c'], ['a', 'b', 'c']] >>>
Why is the result duplicated in each list? The fact is that the multiplication operator creates links within our list to the same list. It is easy to be convinced of it a little having added our example:
>>> c = [[], [], []] >>> hex(id(c[0])), hex(id(c[1])), hex(id(c[2])) ('0x104ede7e8', '0x104ede7a0', '0x104ede908') >>> >>> hex(id(f[0])), hex(id(f[1])), hex(id(f[2])) ('0x104ede710', '0x104ede710', '0x104ede710') >>>
In the first case, everything is fine and the references to the lists are different, and in the second example we refer to the same object. From this it follows that a change in the first list will entail a change in the subsequent ones, so be careful.
The second example was already considered on Habré, but I wanted to include it in the article. Let's look at lambda, the function that we will run through the for loop, and put each function in the dictionary:
>>> tmp = {} >>> for i in range(10): ... tmp[i] = lambda: i >>> tmp[0]() 9 >>> tmp[1]() 9 >>>
Within the lambda function, the variable i is closed and an instance of another variable i is created in the lambda block, a function that is a reference to the variable i in the for loop. Every time the for loop counter changes, the values in all lambda functions also change, so we get the value of i-1 in all functions. It is easy to fix this by explicitly passing the default value of the i variable to the lambda function as the first parameter:
>>> tmp = {} >>> for i in range(10): ... tmp[i] = lambda i = i: i >>> tmp[0]() 0 >>> tmp[1]() 1 >>>