The myth of the ideal number of lines in the method

There is a myth that if a function has more than n or less than m lines of code, then there is a design problem with the function. For example, the author of the publication “Reflections on Design Principles” says that “the number of lines of a method ... is a necessary but not sufficient condition for a good design”. In this article I will set out the reasons why I consider the need for a function to be a certain size a myth and give examples that prove it, but first let's consider the reasons for the popularity of this myth.

The reasons for the popularity of the myth

If you take almost any algorithm from real life, then it can easily be broken down into a small number of operations, usually from 3 to 10, and these operations serve their unique micro-goals. For example, to sit at a chair standing near the table, you need to 1) see a chair; 2) go to the chair; 3) move the chair; 4) sit behind the chair. Such a description of actions is quite understandable and, looking at each action, you can understand what lies behind it and what steps you need to perform to implement it. This is an example of good design. If instead of the “see the chair” step there would be several operations of straining the eye muscles, and instead of “approaching the chair” there was a cycle of gradual advancement with constant editing of the route, then such an algorithm would be difficult to understand; even remembering all the details of the algorithm is already problematic. This is an example of poor design without proper control of complexity. In this case, it is better to single out the operations that serve the same purpose - to reduce the distance between the chair and the person, into one function.

If you take other algorithms, for example, frying eggs or going to the movies - again, you can select up to 10 smaller operations that serve DIFFERENT micro targets. Agree that it is right to come up with an example, when there are much more than 10 operations, it is rather difficult, isn't it? If you still get quite a lot of operations, then you can certainly find a common goal for some, or you may be too fixated on error handling, which is actually not the BASIC part of the algorithm.

Myth refutation

To come up with an algorithm from life, in which at the top level of abstraction there are a large number of operations that cannot be combined into larger ones (as a result, your main function of the algorithm will swell across many lines of code) is rather difficult, but possible. For example, the ritual dance of some Aboriginal tribe, which consists of the following actions: 1) sit down; 2) for a while; 3) growl; 4) wave your hands; 5) stand up; 6) jump ... and another 100 or 500 chaotic unsystematic actions.

What is the top level of abstraction

The first level of abstraction is the global purpose of the code, which, with a good design, coincides with the name of the main function of the algorithm, in this case, let it be "Call rain". The first level of abstraction is degenerate and there is always one operation on it, therefore we exclude it from consideration and by the highest we mean the second. At the highest (second) level of abstraction, there are such operations as “Crouch” and “Pokudakhtat”. At the same time, the “sit down” goal is to change the position of the body, and the other has to “make the cackle” make sounds that mimic the sound of a chicken. These goals are different, and if you look for a common goal between them, then in this context they have only one common goal - “To cause rain”, therefore you cannot select them as a separate function that would be triggered inside “To cause rain”. "Crouch" and "crouch" should be in the "cause the rain" directly. Therefore, they are at the highest level of abstraction.

')
Still want your function to be less than n lines of code? Then you have to break the method into 1) perform the first part of the ritual dance; 2) perform the second part - and so on, and this is an example of poor design, because it is not clear what this or that function hides behind it. I will give an example of a similar situation, but from the field of programming. Suppose there is a module that downloads pages from a certain site and distributes them in order to obtain information about the purchase, which has protocols, lots, and much more. The main module method looks like this:

1) Find out information about the supplier;
2) Find out information about the purchase lots;
3) Learn information about the auction of the procurement;
4) Learn information about the procurement protocols;
5) Find out information on the latest date of purchase status change.
... and similar methods. In addition, each method downloads different pages and applies different parsing algorithms. Each operation is worthy of being singled out in a separate method and in no way will it be possible to combine some of them into a separate function with a clear name (the option “Check information on the supplier and lots and auction” is not offered). The subject model and the number of pages on the source site can be increased to infinity (as well as the number of front-end site developers who are adding more and more specifics to different pages of the site). Another example that refutes a myth is a family of cryptographic algorithms. The greater the n-maximum number of permissible method size you wouldn’t name, the longer the cryptographic algorithm can be thought up.

The reverse side of the myth

There is another interpretation of this myth - if the method has less than m lines of code (for example, 50), then something is wrong with it. How could such a point of view originate? Imagine a code that does not have a coherent architecture, in which the names of classes and methods either do not reflect the purpose of the entities, or are completely misleading. Perhaps the code was initially good, but then someone made changes to the “Get information about the latest date of purchase status change” function and now it also saves the information to the database and sends notifications to users by email, but the function name has not changed . Or someone made changes to the search date search algorithm itself, but made them not in this function, but in some other place, then the function would have to be renamed “Find out information on the last date of change of the PART of purchase status” or “ Get information about the event log ”(now it is the PART of the search operation for the date of the change, and the method should be called accordingly), but, alas, the function was not renamed. As a result, there is no trust in the names of the methods in this code, and to find out WHAT REALLY HAPPENS HERE, you need to fall into each of them. And if the code is fragmented into a large number of methods and the depth of nesting is great, then you need to fall deeper and deeper ... As a result, the code is easy to get confused, like in a maze. But if all the class code were in one giant function, then it would at least be visible, in full view, and obviously false names of functions would not be confusing.

Now imagine a fictional programmer named Marcus. Marcus is not very zealous in the study of design and every day works with the above described awkward code. Gradually, Marcus begins to notice that “big code is easier to understand”, and “finely cracked” code begins to be associated with a headache. Then someone briefly tells him about the principle “do not produce extra essences”. What kind of essence is superfluous, and what is not, Marcus cannot explain, but takes the principle into service. Then Marcus learns about the KISS principle from somewhere, and decides that once "the smaller the entities, the easier it is to understand", then "the smaller the entities, the more the code corresponds to KISS."

Here is an example of an article whose character is also called Markus, and who wrote a class that can bake any sort of bakery product according to any recipe on any stove with any source of fuel, and this class has only one method. As far as I understand, he has two classes in the whole program - bread (which may in fact be a pie) and Manager (which at a meeting can say “I can do everything!” And not lie). Our Marcus (ours, who is from this article) agrees and believes that this is BEST PRACTICE and following the KISS principle, and if you don’t produce God objects with 1000 lines of code each, then you have something to learn from him.
Personally, I think that there is no rule, that the method must be more than m lines, and that in 99.9% of cases it is very possible to write neat little functions, by the appearance of which one can say what is happening inside them, what is their contract and what purpose they serve. At the same time, it will not take a lot of time to search for the necessary functionality and you will not have to look through the whole code.

And what do you need?

We already know how not to do - blindly trust the number of rows in a method. A natural question arises - what is needed? How do I know if I need to add something to a function, or remove something from it? We can learn the principle of "Low coupling & High cohesion" and two smell-a: "Shooting shot" and "Divergent modifications." If, when changing some type of functional, you need to correct a piece of this entity and a piece of another entity, then it means that Shooting shot and Low cohesion started up in the code and it would be nice to merge these two entities into one. If, on changing some type of functionality, we always change that part of the entity, and this part of the entity never changes, then the code smells of “Diverging modification” and it is possible to break the entity into two. For clarification, we use a slightly modified example from the beginning of the article: if, when changing the mode of movement of a robot approaching a chair, you constantly change part of the algorithm regarding route selection and movement (depending on whether the robot moves on the ground, underground or land), then you need to allocate a separate function "approach the table." As a result, where there was one entity, two appear. It is also necessary to give understandable names of entities, so that by name alone it can be understood what this entity is doing (and what it does not do).

PS All of the above is just my personal opinion. All characters are fictional.

Here the article ends and then comes the example of another long algorithm that is difficult to decompose, added at the request of readers. So, in order to encrypt some text using one known cryptographic algorithm, you need:
1) Break the source code into blocks of 64 bits
2) For each such unit
{
2.1) Rearrange the bits in the block in places according to a certain algorithm
2.2) Break the block into two blocks 32 bits long (hereinafter - left and right)
2.3) Repeat 16 times:
{
2.3.1) Calculate Ki from the encryption key K and the iteration number according to a certain algorithm.
2.3.2) Extract a block E of length 48 from a right block of length 32 according to a certain algorithm.
2.3.3) F = bitwise sum of Ki and E
2.3.4) the left block is 32 length long = the right block is 32 bits long at the last iteration.
2.3.5) right block of length 32 = left block 32 at the last iteration, bitwise folded with F
2.3.6) Add the left block to the end of the encryption result
2.3.7) Add the right block to the end of the encryption result
}
}
I will add that the description of the Ki calculation algorithm on wikipedia did not fit on my monitor, so I don’t think that in the program it will be a single line of code. If you create a “calculate Ki” function, this will be a bad design, since it is not clear what this function does and it is not clear what Ki is. However, if Ki has some common name, and Ki is some common abstraction, the function “calculate Ki” has a right to exist. In order to make all the same decomposition, the developers of the algorithm of such abstractions themselves and created, and not very well, their names are similar to "The first piece of the algorithm" and "A piece of the name of such and such." Agree, terrible design. However, this design is generally accepted and is the subject area, so if you want to use it, then everything is fine. However, let's imagine that we need to slightly modify the algorithm in order to create our own unique algorithm that has properties similar to the original algorithm (this can be useful, because it is more difficult for an attacker to crack the “unknown to science” algorithm). In the new algorithm, the “Piece of the name of such and such” is modified, and it is no longer the “Piece of the name of this”. In this case, it is better not to break the above algorithm into small functions, but leave it as it is, otherwise once (a week after writing the code) you will get entangled in this code as in a maze. The algorithm here is DES. “The first piece” is “Initial permutation”, “A piece of the name of such a one” is “The function of Feistel”, E is the “Function of expansion”. In a modified algorithm, all this is different. It is possible to make a partial decomposition of the modified DES, for example, select the “Add Bitwise Add Up Blocks”, “Perform Permutation” methods (for the Block, from the Permutation Matrix), etc., but the “Encrypt 64 block” method will still be indecently large. Selecting the “Encrypt block 32 length” method is a bad idea, because you cannot decrypt it; Encryption and decryption are applicable to blocks 64. I am sure that DES is not the longest of the algorithms, it is possible to find (or invent) and longer ones such that they have a high number of operations at the highest level of abstraction, the purposes and purpose of which differ from each other. friend

Source: https://habr.com/ru/post/239799/

All Articles

The myth of the ideal number of lines in the method

The reasons for the popularity of the myth

Myth refutation

The reverse side of the myth

And what do you need?

More articles: