This is the second selection of tips about Python and programming from my author’s
@pythonetc channel. Previous selections:
Regular languages
A regular language is a formal language that can be represented as a
finite state machine . In other words, for character-based text processing, you just need to remember the current state, and the number of such states is of course.
A great example: a machine that checks whether the input data is a prime number like –3, 2.2, or 001. At the beginning of the article, the state machine is shown. Double circles indicate final states, the machine can stop in them.
')
The machine starts from position ①. Perhaps he finds a minus, then a digit, and then at the position ③ he processes the required number of digits. After that, the decimal separator ((→ ④) can be checked, followed by a single digit (④ → ⑤) or more (⑤ → ⑤).
A classic example of an irregular language is a family of string expressions of the form:
ab
aaa-bbb
aaaaa-bbbbb
Formally, we need a line containing N instances of
a
, then
–
, then - N instances of
b
, where N is an integer greater than 0. You cannot implement this using a state machine, because you will have to remember the number of characters that you thought you could make only using an infinite number of states.
Regular expressions can only specify regular languages. Before using them, make sure that your string can be processed using a state machine. For example, they are not suitable for processing JSON, XML, or even arithmetic expressions with nested brackets.
It's funny that many modern regular expression engines are not regular. For example, the regex module for Python supports recursion (which
will help in solving the problem with
aaa-bbb
).
Dynamic scheduling
When Python executes a method call, say
af(b, c, d)
, it must first select the correct function
f
. By virtue of polymorphism,
a
determines what will be ultimately selected. The process of choosing a method is usually called dynamic dispatch.
Python only supports single-dispatch polymorphism. This means that the choice of an object is affected only by the object itself (in our example,
a
). In other languages, types
b
,
c
and
d
can be taken into account — such a mechanism is called multiple dispatch. A prime example is the C # language.
However, multiple dispatching can be emulated using a single. For this purpose, the “visitor” design pattern was created: it uses single dispatching twice to simulate a double one.
Remember that overloading (overloading) methods (as in Java and C ++) is not an analogue of multiple dispatching. Dynamic scheduling works in runtime, and overloading is performed only during compilation.
These examples will help you better understand the topic:
Built-in names
In Python, you can easily modify all the standard variables that are available in the global scope:
>>> print = 42 >>> print(42) Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: 'int' object is not callable
This is useful if your module defines functions whose names match the names of built-in functions. This happens in situations where you practice metaprogramming and accept an arbitrary string value as an identifier.
But even if you duplicate the names of some built-in functions, you may need access to what they originally referred to. This is what the builtins module is for:
>>> import builtins >>> print = 42 >>> builtins.print(1) 1
Also in most modules, the variable
__builtins__
is available. But there is one trick. First, this is a feature of the cpython implementation, and usually it should not be used at all. Secondly,
__builtins__
can refer to both
builtins
and
builtins.__dict__
, depending on how the current module was loaded.
strace
Sometimes the application starts to behave strangely in battle. Instead of restarting it, you may want to understand the cause of the problems as long as possible.
The obvious solution is to analyze the program's actions and try to understand what part of the code is being executed. Proper logging facilitates this task, but your logs may not be sufficiently detailed due to the architecture or the level of logging selected in the settings.
In such cases, strace may be helpful. This is a Unix utility that tracks system calls. You can run it beforehand -
strace python script.py
- but it is usually more convenient to connect to an already running application:
strace -p PID
.
$ cat test.py with open('/tmp/test', 'w') as f: f.write('test') $ strace python test.py 2>&1 | grep open | tail -n 1 open("/tmp/test", O_WRONLY|O_CREAT|O_TRUNC|O_CLOEXEC, 0666) = 3
Each line of the trace contains the name of the system call, the arguments in brackets and the return value. Because some arguments are used to return the result of a system call, and not to pass data to it, the output of a string can be suspended until the system call is completed.
In this example, the output is stopped until writing to STDIN is complete:
$ strace python -c 'input()' read(0,
Tuple literals
One of the most inconsistent parts of the Python syntax is tuple literals.
To create a tuple, it is enough to list the values separated by commas:
1, 2, 3
. What about a one-piece tuple? Just add a hanging comma:
1,
,. It looks ugly and often leads to errors, but it is quite logical.
How about an empty tuple? This is one comma -,? No, it is
()
. And what, brackets create a tuple, like commas? No,
(4)
is not a tuple, it's just
4
.
In : a = [ ...: (1, 2, 3), ...: (1, 2), ...: (1), ...: (), ...: ] In : [type(x) for x in a] Out: [tuple, tuple, int, tuple]
To confuse everything even more strongly, additional brackets are often required for literals of tuples. If you need a tuple to be the only function argument, then obviously
f(1, 2, 3)
will not work - you have to write
f((1, 2, 3))
.