📜 ⬆️ ⬇️

How to make Python functions even better

Actually, the title of this wonderful article from Jeff Knapp (Jeff Knupp), the author of the book " Writing Idiomatic Python " fully reflects its essence. Read carefully and feel free to comment.

Since we really didn’t want to leave an important term in the text in Latin, we allowed ourselves to translate the word “docstring” as “prestring”, finding this term in several Russian-language sources .

In Python, as in most modern programming languages, the function is the main method of abstraction and encapsulation. You, being a developer, have probably written hundreds of functions. But the functions of functions - is different. Moreover, if you write "bad" functions, it will immediately affect readability and support for your code. So, what is a “bad” function, and more importantly, how to make it a “good” one?

Freshen up the topic


Mathematics is replete with functions, although it is difficult to recall them. So let's go back to our favorite discipline: analysis. You've probably seen formulas like f(x) = 2x + 3 . This is a function called f , which takes an argument x , and then "returns" twice x + 3 . Although it is not too similar to the functions we are used to in Python, it is completely analogous to the following code:
')
 def f(x): return 2*x + 3 

Functions have long existed in mathematics, but in computer science they are completely transformed. However, this power is not given in vain: various pitfalls have to be passed. Let's discuss what a “good” function should be, and which “bells” are characteristic of functions that may require refactoring.

Secrets of good function


What distinguishes a “good” Python function from a mediocre one? You will be surprised how many interpretations allow the word "good." In this article, I will consider the Python function "good" if it satisfies most of the items in the following list (it is sometimes impossible to complete all items for a particular function):


To many of you, these requirements may seem overly harsh. However, I promise: if your functions comply with these rules, it will turn out so beautiful that even a unicorn will pierce a tear. Below I will devote a section to each of the elements in the above list, and then complete the story by telling how they are in harmony with each other and help create good functions.

Naming

Here is my favorite quote on this topic, often mistakenly attributed to Donald, and actually owned by Phil Carlton :
There are two difficulties in computer science: cache invalidation and naming.
No matter how silly it sounds, naming is really a complicated thing. Here is an example of a “bad” function name:

 def get_knn_from_df(df): 

Now bad names come across to me almost everywhere, but this example is taken from the field of Data Science (more precisely, machine learning), where practitioners usually write code in a Jupyter notebook, and then try to assemble a digestible program from these cells.

The first problem with the name of this function is that it uses abbreviations. It is better to use complete English words, rather than abbreviations and not well-known abbreviations . The only reason you want to shorten words is not to waste energy on typing extra text, but in any modern editor there is an auto-completion function , so you have to type the full name of the function just once. The abbreviation is a problem, because it is often specific to the subject area. In the code above, knn means “K-nearest neighbors,” and df means “DataFrame,” a data structure commonly used in the pandas library. If the code is read by a programmer who does not know these abbreviations, he will understand almost nothing in the name of the function.

There are two smaller faults in the name of this function. First, the word "get" redundant. In most well-named functions, it is immediately clear that this function returns something, which is specifically reflected in the name. The from_d f element is also not needed. Either in the function docstring, or (if it is on the periphery), the type of the parameter will be described in the type annotation, if this information is not so obvious from the name of the parameter .

So how do we rename this function? Simply:

 def k_nearest_neighbors(dataframe): 

Now, even to a non-expert, it is clear what is calculated in this function, and the name of the parameter (dataframe) leaves no doubt what argument should be passed to it.

Sole responsibility


Developing the thought of Bob Martin, I will say that the principle of sole responsibility concerns functions no less than classes and modules (of which Mr. Martin originally wrote). According to this principle (in our case) the function should have the sole responsibility. That is, she must do one and only one thing. One of the most compelling arguments in favor of this: if the function does only one thing, then it will have to be rewritten in one case: if this very thing has to be done in a new way. It also becomes clear when the function can be removed; if, making changes somewhere else, we realize that the only duty of the function is no longer relevant, then we will just get rid of it.

Here it is better to give an example. Here is a function that does more than one “thing”:

 def calculate_and print_stats(list_of_numbers): sum = sum(list_of_numbers) mean = statistics.mean(list_of_numbers) median = statistics.median(list_of_numbers) mode = statistics.mode(list_of_numbers) print('-----------------Stats-----------------') print('SUM: {}'.format(sum) print('MEAN: {}'.format(mean) print('MEDIAN: {}'.format(median) print('MODE: {}'.format(mode) 

Namely two: calculates a set of statistical data about the list of numbers and outputs them to STDOUT . The function violates the rule: there must be only one specific reason why it might need to be changed. In this case, there are two obvious reasons why this will be needed: either you need to calculate new or other statistics, or you need to change the output format. Therefore, it is better to rewrite this function as two separate functions: one will perform the calculations and return their results, and the other will accept these results and output them to the console. The function (or rather, the presence of its two duties) with the giblets gives the word and in its name .

Such a division also seriously simplifies the testing of a function, but also allows not only splitting it into two functions within the same module, but even spreading these two functions into completely different modules, if appropriate. This further contributes to cleaner testing and simplifies code maintenance.

In fact, functions that perform exactly two things are rare. More often you come across functions that do much, much more operations. Again, for reasons of readability and testability, such “multi-station” functions should be split into single-task, each of which contains a single aspect of the work.

Strings


It would seem that everyone is aware that there is a PEP-8 document, where recommendations are made on the style of Python code, but there are much fewer among us who know PEP-257 , in which the same recommendations are given about doctrines. In order not to retell the contents of PEP-257, I refer you to this document yourself - read in your free time. However, his main ideas are:


All these items are easy to follow when writing functions. Simply writing dokstrok should become a habit, and, try to write them before proceeding to the code of the function itself. If you are unable to write a clear document that characterizes a function, this is a good reason to wonder why you are writing this function at all.

Return values


Functions can (and should ) be interpreted as small self-sufficient programs. They take some input in the form of parameters and return the result. Parameters, of course, are optional. But return values ​​are required in terms of Python internals . If you even try to write a function that does not return values, you cannot. If the function does not even return values, the Python interpreter will “force” it to return None . Do not believe? Try it yourself:

 ❯ python3 Python 3.7.0 (default, Jul 23 2018, 20:22:55) [Clang 9.1.0 (clang-902.0.39.2)] on darwin Type "help", "copyright", "credits" or "license" for more information. >>> def add(a, b): ... print(a + b) ... >>> b = add(1, 2) 3 >>> b >>> b is None True 

As you can see, the value of b is essentially None . So, even if you write a function without a return statement, it will still return something. And should. After all, this is a small program, right? How useful are the programs from which there is no conclusion - and therefore it is impossible to judge whether this program was executed correctly? But most importantly - how are you going to test such a program?

I’m not even afraid to say the following: each function must return a useful value, at least for the sake of testability. The code I write must be tested (this is not discussed). Just imagine how clumsy the testing of the above add function can turn out (hint: you will have to redirect the I / O, after which soon everything will go awry). In addition, by returning a value, we can perform chaining of methods and, therefore, write code like this:

 with open('foo.txt', 'r') as input_file: for line in input_file: if line.strip().lower().endswith('cat'): # ...     -  

The line if line.strip().lower().endswith('cat'): works because each of the string methods ( strip() , lower() , endswith() ) returns a string as a result of a function call.

Here are some common arguments that a programmer can give you, explaining why a function written by him does not return a value:
“It’s just [some kind of I / O operation, for example, storing a value in a database]. I cannot return anything useful here. ”
I do not agree. The function may return True if the operation completed successfully.
"Here we change one of the available parameters, use it as a reference parameter." ""
Here are two comments. First, try with all your might not to do so. Secondly, supplying the function with any argument only in order to find out that it has changed is surprising at best, and at worst simply dangerous. Instead, as with working with string methods, try to return a new instance of the parameter, which already reflects the changes applied to it. Even if this cannot be done, since the creation of a copy of some parameter is associated with excessive costs, you can still roll back to the option “Return True if the operation was completed successfully” suggested above.
“I need to return multiple values. There is no such unique value that in this case it would be advisable to return. "
This argument is a bit contrived, but I have heard it. The answer, of course, is just that the author wanted to do - but did not know how: use a tuple to return several values .

Finally, the strongest argument in favor of the fact that it is better to return a useful value in any case — that the caller can always ignore these values ​​with full authority. In short, returning a value from a function is almost certainly a sensible idea, and it is highly unlikely that we will damage anything in this way, even in established code bases.

Function length


I have repeatedly admitted that quite stupid. I can simultaneously hold about three things in my head. If you let me read the 200-line function and ask what it does, I probably will stare at it for at least 10 seconds. The length of a function directly affects its readability and, therefore, support . So try to keep your functions short. 50 lines is a value taken completely from the ceiling, but it seems reasonable to me. (I hope) that most of the functions that you have to write will be much shorter.

If the function complies with the principle of sole responsibility, then it is likely to be sufficiently brief. If she is reading or idempotent (we'll talk about it) below - then, probably, she will also be short. All these ideas are harmoniously combined with each other and help to write good, clean code.

So what to do if your function is too long? REFACTIVE! You probably have to refactor all the time, even if you don't know the term. Refactoring is simply changing the structure of a program, without changing its behavior. Therefore, extracting several lines of code from a long function and turning them into an independent function is one of the types of refactoring. It turns out that this is also the most common and fastest way to productively shorten long functions. Since you give these new functions the proper names, the code you get is much easier to read. I wrote a whole book on refactoring (in fact, I am constantly engaged in it), so I won't go into details here. Just know that if you have too long a function, then it should be refactored.

Idempotency and functional purity


The title of this section may seem slightly intimidating, but conceptually the section is simple. An idempotent function with the same set of arguments always returns the same value, no matter how many times it is called. The result does not depend on non-local variables, the variability of the arguments, or on any data coming from the input / output streams. The following function add_three(number) idempotent:

 def add_three(number): """ ** + 3.""" return number + 3 

No matter how many times we call add_three(7) , the answer will always be 10. And here's another case — a function that is not idempotent:

 def add_three(): """ 3 + ,  .""" number = int(input('Enter a number: ')) return number + 3 

This frankly contrived function is not idempotent, since the return value of the function depends on the input / output, namely, on the number entered by the user. Of course, with different calls to add_three() returned values ​​will be different. If we call this function twice, the user can enter 3 in the first case, and 7 in the second, and then the two add_three() calls return 6 and 10, respectively.

Outside of programming, there are also examples of idempotency - for example, the “up” button on the elevator is arranged according to this principle. By pressing it for the first time, we “notify” the elevator that we want to rise. Since the button is idempotent, then no matter how much you press it later, nothing terrible will happen. The result will always be the same.

Why idempotency is so important


Testability and ease of support. Idempotent functions are easy to test, since they are guaranteed to, in any case, return the same result if called with the same arguments. Testing boils down to checking that with various calls the function always returns the expected value. Moreover, these tests will be fast: the speed of tests is an important issue that is often overlooked in unit testing. And refactoring when working with idempotent functions is generally an easy walk. No matter how you change the code outside the function, the result of calling it with the same arguments will always be the same.

What is a “clean” function?


In functional programming, a function is considered pure if it is, firstly , idempotent, and , secondly , it does not cause observable side effects . Do not forget: a function is idempotent if it always returns the same result with a specific set of arguments. However, this does not mean that the function cannot affect other components — for example, non-local variables or input / output streams. For example, if the idempotent version of the above function add_three(number) output the result to the console, and only then return it, it would still be considered idempotent, since when it accesses the I / O stream, this access operation does not affect the value returned by from function. A print() call is simply a side effect : the interaction with the rest of the program or system itself, along with the return of the value.

Let's develop our example with add_three(number) little. You can write the following code to determine how many times add_three(number) was called:

 add_three_calls = 0 def add_three(number): """ ** + 3.""" global add_three_calls print(f'Returning {number + 3}') add_three_calls += 1 return number + 3 def num_calls(): """,     *add_three*.""" return add_three_calls 

Now we perform output to the console (this is a side effect) and change the non-local variable (another side effect), but since neither the one or the other does not affect the value returned by the function, it is still idempotent.

Pure function has no side effects. It not only does not use any "external data" when calculating the value, but does not interact with the rest of the program / system, it only calculates and returns the specified value. Consequently, although our new definition of add_three(number) remains idempotent, this function is no longer pure.

In pure functions, there are no logging instructions or print() calls. When working, they do not access the database and do not use Internet connections. Do not apply to non-local variables and do not change them. And do not cause other non-pure functions .

In short, they do not have an “eerie long-range action”, in the words of Einstein (but in the context of informatics, not physics). They do not in any way alter the rest of the program or system. In imperative programming (which is what you are doing when writing code in Python), such functions are the safest. They are known for their testability and ease of support; Moreover, since they are idempotent, testing of such functions is guaranteed to be as fast as execution. The tests themselves are also simple: you don’t have to connect to the database or imitate any external resources, prepare the initial configuration of the code, and you don’t need to clean anything up after the work is done.

To be honest, idempotency and purity are very desirable, but not necessary. , , , . , , , , . , , .

Conclusion


That's all. , – . . , . – ! . , , , « ». .

Source: https://habr.com/ru/post/426381/


All Articles