Code Like a Pythonista: Idiomatic Python (part2)

After a short break, I present the final part of the translation of David Guder’s article “Write the code like a real Pythonist: Python idiomatics”

References to the first and second parts.
')

Once again, the author in this article does not discover America, most of the Pythonists will not find any "special magic" in it. But the methodologies for using and selecting various constructions in Python are listed in some detail in terms of readability and proximity to the PEP8 ideology.
In some places in the author's article there are no examples of source codes. Of course, I left it as it is, I did not invent my own, in principle it should be clear what the author meant.

List generators (“List Comprehensions” - possibly as “list convolution” - note. Transl.)

List generators (“listcomps” for short) are a syntactic abbreviation for the following pattern:
The traditional path, with for and if statements:

new_list = [] for item in a_list: if condition(item): new_list . append(fn(item)) 

And so with the list generator:

new_list = [fn(item) for item in a_list if condition(item)] 

List generators are straightforward and short to the point. You may need a lot of nested for and if conditions in the list generator, but for two, three cycles, or sets of if conditions, I recommend using nested for loops. In accordance with the Python Zen, it is better to choose a more readable way.
For example, the list of squares in the number series 0–9:

>>> [n ** 2 for n in range ( 10 )] [ 0, 1 , 4 , 9 , 16 , 25 , 36 , 49 , 64 , 81 ] 

A list of odd squares between 0–9:

>>> [n ** 2 for n in range ( 10 ) if n % 2 ] [ 1 , 9 , 25 , 49 , 81 ] 

Expressions Generators (1)

Let's sum up the squares of numbers up to 100:
In the loop:

total = 0 for num in range ( 1 , 101 ): total += num * num 
You can use the sum function to quickly assemble the sequence that suits us.
With list generator:
total = sum ([num * num for num in range ( 1 , 101 )]) 

With a generator expression:
total = sum (num * num for num in xrange( 1 , 101 )) 

Generator expressions (“genexps”) are as simple as list generators, but list generators are “greedy,” and expression-generators are “lazy.” The list generator calculates the whole list-result, all at once. The generator expression calculates only one value per pass when needed. This is especially useful for long sequences, when the computed list is just an intermediate step, not the final result.
In this case, we are interested only in the total amount; we do not need an intermediate list of squares of numbers. We use xrange for the same reason: it returns values lazily, one per iteration.

Expressions-generators (2)

For example, if we summed the squares of several billion numbers, we would face a lack of memory, and expression-generators do not have such a problem. But this, all the same, takes time!

total = sum (num * num for num in xrange( 1 , 1000000000 )) 

The difference in syntax is that the list generator is placed in square brackets, but the generator expression is not. Generator expressions are sometimes required to be enclosed in parentheses, so you should always use them.
The basic rule is:

Use the list generator when the calculated list is the desired result.
Use a generator expression when a computed list is just an intermediate step.

Here is an example that recently caught in work.
? (for some reason, there is no example code - approx. transl.)
We need a dictionary containing numbers (both in strings and integers) and month codes for future contracts. It can be obtained in just one line of code.
? (for some reason, there is no example code - approx. transl.)
The following will help us:

dict () takes a list of key / value pairs (2-tuples).
We have a list of month codes (each month is encoded with one letter, and the string is also a list of characters). We use the enumerate function for this list to get the numbered codes of all months.
Month numbers start at 1, but Python starts indexing at 0, so the month number is one more than the corresponding index.
We need to search the month by line and number. We can use for this function int (), str () and iterate over them in a loop.

A recent example:

month_codes = dict ((fn(i + 1 ), code) for i, code in enumerate ( 'FGHJKMNQUVXZ' ) for fn in ( int , str )) 

month_codes result:

{ 1 : 'F' , 2 : 'G' , 3 : 'H' , 4 : 'J' , ... '1' : 'F' , '2' : 'G' , '3' : 'H' , '4' : 'J' , ... } 

Sorting

Sorting lists in Python is easy:

a_list . sort() 

(Note that the list is sorted in the same list: the original list is sorted, and the sort method does not return a list or a copy of it)
But what if you have a list of data that you need to sort, but in a different order than standard (ie, sort by first column, then by second, etc.)? You may need to sort first by the second column, then by the fourth.

We can use the built-in sort list method with a special function:

def custom_cmp (item1, item2): return cmp ((item1[ 1 ], item1[ 3 ]), (item2[ 1 ], item2[ 3 ])) a_list . sort(custom_cmp) 

It works, but it is extremely slow with large lists.

Sort with DSU *

DSU = Decorate-Sort-Undecorate
* Note: DSU is often not so necessary. See the next section “Key sorting” for a description of another method.
Instead of creating a special comparison function, we create an auxiliary list with which sorting will be normal:

# Decorate: to_sort = [(item[ 1 ], item[ 3 ], item) for item in a_list] # Sort: to_sort . sort() # Undecorate: a_list = [item[ - 1 ] for item in to_sort] 

The first line creates a list containing tuples: consisting of the sort condition in the correct order and the complete record (element) of the data.
The second line performs traditional sorting, fast and efficient.
The third line retrieves the last value from the sorted list.
Remember, this last value is the whole element (record, block) of data. We discard the sorting conditions with which the work was done, and they are no longer needed.

This achieves a compromise of the memory used, the complexity of the algorithm and the execution time. Much easier and faster, but you have to duplicate the original list.

Key sorting

In Python 2.4, the optional argument “key” appeared in the sort list method, which in turn sets the function of one argument, used to calculate the comparison key for each element of the list. For example:

def my_key (item): return (item[ 1 ], item[ 3 ]) to_sort . sort(key = my_key) 

The my_key function will be called once for each item in the to_sort list.
You can assemble your own key-function or use any existing function of one argument, if necessary:

str.lower to sort alphabetically regardless of the case of characters.
len to sort by the length of the elements (strings or containers).
int or float to sort in numerical order, as with numeric strings like "2", "123", "35".

Generators

We have already seen the expression-generators. We can develop our arbitrarily complex generators as functions:

def my_range_generator (stop): value = 0 while value < stop: yield value value += 1 for i in my_range_generator( 10 ): do_something(i) 

The yield keyword turns a function into a generator. When you call a generator-function, instead of executing the code, Python returns a generator object, which, as we remember, is an iterator; and it has a method next. In the for loop, the iterator's next method is simply called until the StopIteration exception is generated. You can cause StopIteration explicitly or implicitly, falling out at the end of the code, as above.
Generators can simplify the processing of a sequence / iterator, since we do not need to build a specific list; just one value is calculated for each iteration.

Let me explain how the for loop actually works. Python looks at the sequence specified after the in keyword. If it is a simple container (like a list, tuple, dictionary, set, or user-defined), Python converts it into an iterator. If this object is already an iterator, Python uses it directly.
Python then repeatedly calls the iterator's next method, binds the return value to the loop counter (i in this case), and executes the loop body code. This process is repeated again and again until the StopIteration exception is raised or the break instruction in the loop body is executed.
The for loop may include an else condition (otherwise), the code of which will be executed after exiting the loop, but not after the execution of the break instruction. This feature provides very elegant solutions. The else condition is not always and often used with a for loop, but it may come in handy. Sometimes else successfully expresses the logic you need.
For example, if you need to check the condition contained in some element, any element of the sequence:

for item in sequence: if condition(item): break else : raise Exception ( 'Condition not satisfied.' ) 

Generator example

Filter blank lines from a CSV file (or items from the list):

def filter_rows (row_iterator): for row in row_iterator: if row: yield row data_file = open (path, 'rb' ) irows = filter_rows(csv . reader(data_file)) 

Reading lines from a text file

datafile = open ( 'datafile' ) for line in datafile: do_something(line) 

It is possible, because the files support the next method, as other iterators do: lists, tuples, dictionaries (for their keys),
generators.
Be careful here: due to the buffering of file operations, you cannot mix the .next and .read * methods if you are not using Python 2.5+.

EAFP vs. Lbyl

It's easier to ask forgiveness than permission. (It's easier to ask forgiveness than permission)
Measure seven times, cut one. (Look before you leap)
EAFP is usually preferred, but not always.

Duck typing
If it walks like a duck, quacks like a duck and looks like a duck, then it is a duck. (Gus? Close enough.)
Exceptions
Use an explicit indication if the object should be of a specific type. If x must be a string for your code to work, then why not declare it?

str (x) 

and instead of trying at random, use something like:

isinstance (x, str )

EAFP try / except Example

You can put code that is prone to exceptions in the try / except block to catch errors, and in the end, you might get a more general solution than if you tried to anticipate all the options.

try : return str (x) except TypeError : ... 

Note: it is always necessary to define exceptions to intercept. Never use the pure except condition. The pure condition except will catch all the exceptions that occur in your code, making it extremely difficult to debug.

Importing

from module import * 

You have probably seen this “wild card” (wild card, pattern) in the module import expressions. You might even like her. Do not use it.
Adaptation of the famous dialogue:

(Outer Dagobah, jungle, swamp and fog.)
LUC: from module import * better than explicit import?
YODA: No better, no. Faster, easier, more seductive.
LUKE: But how do I know that explicit imports are better than wild cards?
YODA: Find out when you want to read your code in six months, try.

(Just in case, I cite the text of the original - note. Transl.)

(Exterior Dagobah, jungle, swamp, and mist.)
LUKE: Is it from module import * better than explicit imports?
YODA: No, not better. Quicker, easier, more seductive.
I know what's better
the wild-card form?
YODA: six months
from now.

Wild Card Import - The Dark Side of Python.

Never!
from module import * heavily pollutes the namespace. You will find objects in your local namespace that you did not expect to receive. You can see the names that override local, previously defined in the module. You cannot figure out exactly where these names come from. Although this form is short and simple, it has no place in the final code.
Moral: Do not use imports with a wild card!
So much better:

name binding through their modules (full description of identifiers, indicating their origin),
import of long module names via a shortened name (alias, alias),
or explicitly import the exact names you need.

Anxiety pollution namespace!
Instead of this,
Link the names through their modules (identifiers described in detail, with their origin):

import module module . name 

or import long module names via alias:

import long_module_name as mod mod . name 

or explicitly import only the names you need:

from module import name name 

Note that this form is not suitable for use in an interactive interpreter, where you may want to edit and reload (“reload ()”) a module.

Modules and Scripts

To make both an imported module and an executable script:

if __name__ == '__main__' : # script code here 

When you import it, the module attribute __name__ is set as the file name without the extension ".py". So the code by condition if will not work when the module is imported. When you execute the script, the __name__ attribute is set to "__main__", and the script code will work.
With the exception of some special cases, you should not put all the code in the top level. Hide the code in functions, classes, methods, and close it with if __name__ == '__main__'.

Module structure

"""module docstring""" # imports # constants # exception classes # interface functions # classes # internal functions & classes def main ( ... ): ... if __name__ == '__main__' : status = main() sys . exit(status) 

This is how a module should be structured.

Command line processing

Example: cmdline.py:

#!/usr/bin/env python """ Module docstring. """ import sys import optparse def process_command_line (argv): """ Return a 2-tuple: (settings object, args list). `argv` is a list of arguments, or `None` for ``sys.argv[1:]``. """ if argv is None : argv = sys . argv[ 1 :] # initialize the parser object: parser = optparse . OptionParser( formatter = optparse . TitledHelpFormatter(width = 78 ), add_help_option = None ) # define options here: parser . add_option( # customized description; put --help last '-h' , '--help' , action = 'help' , help = 'Show this help message and exit.' ) settings, args = parser . parse_args(argv) # check number of arguments, verify values, etc.: if args: parser . error( 'program takes no command-line arguments; ' '"%s" ignored.' % (args,)) # further process settings & args if necessary return settings, args def main (argv = None ): settings, args = process_command_line(argv) # application code here, like: # run(settings, args) return 0 # success if __name__ == '__main__' : status = main() sys . exit(status)

Packages

package / __init__ . py module1 . py subpackage / __init__ . py module2 . py 

Use to organize your projects.
Reduces the cost of finding the path when loading.
Reduce name import conflicts.

Example:

import package.module1 from package.subpackage import module2 from package.subpackage.module2 import name 

In Python 2.5, we now have absolute and relative imports through future import:

from __future__ import absolute_import 

I haven't figured it out deeply enough yet, so we'll omit this part of our discussion.

Simple is better than difficult

First, debugging is twice as difficult to write code. Therefore, if you write code as intelligently as possible, you are, by definition, not smart enough to debug it.
—Brian W. Kernighan, co-author of The C Programming Language and the “K” in “AWK”

In other words, keep your programs simple!

Do not reinvent the wheel

Before writing any code:

Check out the standard Python library.
Check out the Python Package Index (“Cheese Shop”) http://cheeseshop.python.org/pypi
(Apparently, a hint of a sketch about a cheese shop; I found a similar one in the wiki textbook — note. transl.)
Search the net. Google is your friend.