📜 ⬆️ ⬇️

We teach the old dog new tricks or how I learned to love str.format and refused%

I bring to the attention of readers habra and python fans translation of a rather lengthy article about formatting strings. The tale will be true and a hint in it will be that the conservatives should sometimes consider something new, even if the habit stubbornly resists.

Anticipating the curiosity of readers who are inclined to ask questions that are not on the topic of conversation, I will say that the picture is indirectly related to python, though not the most pleasant. I propose to find why as a homework.

Waiting for comments on errors in the design and misprints in a personal - with me traditional habraplyushki.

')
Next, the words of the author of the original article:

I have been writing in python for many years. But at the very beginning of this journey I was interested in learning how to format Perl-style strings. Recall that Perl (and many command line interpreters on Unix) support two types of string literals — with single quotes (when the string is output as it is), and double where the variables are substituted for their values. In Perl, for example, you can write something like:

$name = 'Reuven'; print "Hello, $name\n"; 

And the program, accordingly, will write "Hello, Reuven".

Python string literals do not depend on the type of quotes and the variables in them never expand to their values. To achieve this, traditionally used the operator% for strings. In this context, the operator looks at the line to the left of itself and counts how many values ​​need to be replaced with the values ​​of the corresponding variables to the right of themselves. The result of the operation is a new line with variable values ​​inserted in place of the placeholders. For example:

 >>> name = 'Reuven' >>> "Hello, %s" % name 'Hello, Reuven' 

This python code works fine and displays a personalized greeting. So, despite my many years of practice with python - I was quite pleased with the use of this syntax. Yes, it is not very pleasant and no, I have never kept in my memory a mountain of printf modifiers that affect formatting. In the sense, I always used the 's' modifier (output as a string) and it was enough for me that python implicitly resulted in arguments to the string.

But at the moment the fact that the syntax% is to be written off, or at least declared obsolete. In the python-dev mailing list there is a remark that in the 2.x branch it will live at least until 2022, but nothing is said about the 3.x branch, so support for this syntax will soon be removed and it is undesirable to use it. It was replaced by the str.format method.

In my lessons on python, I always mentioned str.format, but in specific examples I often relied on%. I even recommended that students use% as it seemed much easier to me personally.

But the persistent feeling that I was doing something wrong and, perhaps, even misleading my students, prompted me to study str.format closer. During the study, I came to the following conclusions: 1) It is no more difficult than%, and even simpler in some applications; 2) I have never used the possibilities of str.format to the full, and they are very convenient, despite the time it takes to study them.

Let's start with the simplest. Say “Good morning” to someone, and look at their first and last names, assuming that they are stored in the variables “first” and “last”. In the old way we would do this:

 >>> first = 'Reuven' >>> last = 'Lerner' >>> "Good morning, %s %s" % (first, last) 'Good morning, Reuven Lerner' 

Even in this example, we are faced with one of the problems of% syntax - we now have two variables, and in order to use both of them, we need to make a tuple of them. From the point of view of python, this is logical, but I assure you that for many students this is very surprising.

What will this example look like in the case of str.format? Pretty similar:

 >>> "Good morning, {} {}".format(first, last) 'Good morning, Reuven Lerner' 

Please note that we have slightly changed the principle. Now this is not a binary operator above strings, but a method of an object that takes a series of parameters. This is logical and more consistent. For the same students, the% operator in my examples looked like an addition to print, not an operation on strings. The notation with ".format" after the line makes it more obvious that this is a method specifically for this line.

As you probably already know, the entries “{} {}” in the line say that str.format should take two parameters, the values ​​of which will be inserted into the string in the order in which they will be passed to the method. There are two arguments, so there must be two {} entries in the string. This is a little more difficult to understand, as curly braces in Python hint people at dictionaries and empty brackets do not look very nice. But it is okay, I can live with it quite well and took it quite easily.

The moment in which str.format shows the first advantage over% is when you need to use the parameters in the reverse order. In fact, with% s, this cannot be achieved at all. It is also impossible to use the value of one variable several times. When using str.format, we can easily change the sequence of substitutions:

 >>> "Good morning, {1} {0}".format(first, last) 'Good morning, Lerner Reuven' 

Note that if I used empty parentheses “{} {}”, then the substitution would occur in the same order in which the parameters are passed to the method. You can imagine the parameters as a sequence indexed from scratch and if I want to change the order of the sequence, then I simply put the necessary indices of this sequence in curly brackets. Our very first example with str.format can be written like this:

 >>> "Good morning, {0} {1}".format(first, last) 'Good morning, Reuven Lerner' 

Note that by explicitly specifying the indexes, we can no longer rely on automatic indexing.

Of course, you can use the sequence from the list using the * operator:

 >>> names = ('Reuven', 'Lerner') >>> "Good morning, {} {}".format(*names) 'Good morning, Reuven Lerner' 

Named arguments can also be used:

 >>> "Good morning, {first} {last}".format(first='Reuven', last='Lerner') 'Good morning, Reuven Lerner' 

I especially like this option. Named parameters are more explicit (if they have good names), and the use of {first} and {last} is quite readable - especially when compared to% (first) s, which is necessary with the% operator

Named parameters can also be expanded from the dictionary using the operator **:

 >>> person = {'first':'Reuven', 'last':'Lerner'} >>> "Good morning, {first} {last}".format(**person) 'Good morning, Reuven Lerner' 

I described all this to my students and was rather surprised at how comfortable they live with this syntax. Yes, and it became more pleasant to work.

It is necessary to mention that the named and positional arguments can technically be used together. But it is better not to do this:

 >>> person = {'first':'Reuven', 'last':'Lerner'} >>> "Good {0}, {first} {last}".format('morning', **person) 'Good morning, Reuven Lerner' 


I warned.

What may be missing in str.format is ... um ... formatting. The bad news is that str.format has completely different rules for determining how to format output. The good news is that these rules are easy to learn and understand.

Let's start again with a simple one: if you want to print a string of a given length, then after the variable name, add a colon (:) and then the number of characters. So, to print my name and add up to ten characters with spaces, I have to do this:

 >>> "Your name is {name:10}".format(name="Reuven") 'Your name is Reuven ' 

(Note that the line is padded with spaces after the name.)

If you need to set the alignment on the right side of the block - use the sign> between: and the number:

 >>> "Your name is {name:>10}".format(name="Reuven") 'Your name is Reuven' 

And yes, you can clearly indicate that I want to align on the left side using the <
If you want to display the value in the center of the block, then instead of <and> the ^ symbol is used:

 >>> "Your name is {name:*^10}".format(name="Reuven") 'Your name is **Reuven**' 

The text is less clear, but what about numbers? Personally, I found it hard to imagine how this would work, but everything turned out to be quite straightforward. For simple output numbers use syntax similar to strings:

 >>> "The price is ${number}.".format(number=123) 'The price is $123.' 

But for numbers, more modifiers are used than for strings. For example, to output a number in binary form we add the modifier “b”, if in hexadecimal - the modifier “x”:

 >>> "The price is ${number:b}.".format(number=5) 'The price is $101.' >>> "The price is ${number:x}.".format(number=123) 'The price is $7b.' 

Of course, the record of the number can be supplemented with leading zeros:

 >>> "Your call is important to us. You are call #{number:05}.".format(number=123) 'Your call is important to us. You are call #00123.' 

Note that inside {} you cannot use executable python code — instead, a simple microlanguage is separate and distinct from python as a whole. There are some minor exceptions. First, you can get the values ​​of attributes / properties through a dot, and second, get the value of an object by index using [].

For example:

 >>> class Foo(object): def __init__(self): self.x = 100 >>> f = Foo() >>> 'Your number is {ox}'.format(o=f) 'Your number is 100'n 

We received the attribute “x” of the object “f”. This object is accessible by the name “o” inside the string. You can get the attribute, but you can not execute it:

 >>> "Your name is {name.upper()}".format(name="Reuven") AttributeError: 'str' object has no attribute 'upper()' 

I tried to execute “name.upper ()”, assuming that the corresponding method would be called, but python does not allow to execute code in this place and regards “upper ()” as an attribute together with brackets. Without parentheses, you get a simple string representation of the function / method:

 >>> "Your name is {name.upper}".format(name="Reuven") 'Your name is <built-in method upper of str object at 0x1028bf2a0>' 

Using square brackets, you can take an element of an object to be iterated (list, row) by index. But slice operations are not supported:

 >>> "Your favorite number is {n[3]}.".format(n=numbers) 'Your favorite number is 3.' 

But:

 >>> "Your favorite numbers are {n[2:4]}.".format(n=numbers) ValueError: Missing ']' in format string 

You can use [] to retrieve entries in the dictionary by name, but the name is entered without quotes:

 >>> person = {'first':'Reuven', 'last':'Lerner'} >>> "Your name is {p[first]}.".format(p=person) 'Your name is Reuven.' 

If you try to use quotes, we get an exception ...

 >>> "Your name is {p['first']}.".format(p=person) KeyError: "'first'" 

Not all the options for using str.format are shown here - in fact, for each type there is a specification of formatting rules. For example, the precision option for floating-point numbers is not available for strings.

You can even add your own formatting rules for the objects of your classes so that they will have a special output method and modifiers to configure it.

If you want to explore this topic in more detail, you should start with PEP 3101, where str.format is described. I can also recommend the presentation of Erik Smith with a fairly good summary on this topic. There are some good examples of how to switch from using% to str.format in the python documentation.

Hope you enjoyed it!

P ... S .: The author of the original article measured the performance of str.format and%. I came to the conclusion that% faster

PPS: According to Andrei Svetlov, svetlov (the words of which can be trusted due to his involvement in the python development team) - the syntax% will not be removed from python 3.x for at least the next 20 years

Source: https://habr.com/ru/post/236633/


All Articles