Translation of an article by Michael O. Church - What is spaghetti code?The easiest way for an epithet to lose its original meaning is to become too broad, to begin to mean a little more than just “I don’t like it”. This refers to the term “spaghetti code”, which people often use as a synonym for the concept of “bad code”. The problem is that not every bad code is a spaghetti code. Spaghetti code is a particularly dangerous and specific kind of bad code, and its particular evil lies in the very way we develop software. Why? Because individual people rarely write spaghetti code on their own. Rather, a certain style in the design makes it increasingly common with time. In order to understand this, one has to consider the original context in which the notion of “spaghetti code” was defined - the terrible (and mostly archaic) use of the
goto operator.
The
goto statement is a simple and powerful mechanism for controlling the flow of program execution: moving to another point in the code. This is what a compiled assembler program actually does in order to transfer control, even if the source code is written using more modern structures such as loops and functions. Using
goto , anyone can implement any control flow they need. It is also difficult not to agree that at the moment
goto is inappropriate for the source code of more modern programs. Exceptions to this rule exist, but they are extremely small. More modern languages ​​do not even have this operator.
')
The
goto operator makes the code difficult to read, because if the program management can jump from one place to another, then it is impossible to say with certainty what condition the program is in when executing a particular piece of code.
Goto-based programs cannot easily be broken down into components, because any point in the code can be a mole's norm to any other part of the code. As a result, such code turns into “everything everywhere” and to understand even a separate part of the code, one must already understand all this confusion, and later it becomes impossible for large programs. This can be easily compared with a bowl of spaghetti, where extracting even one macaroni includes navigation through a large ball of macaroni. You can not just look at the plate and understand what kind of macaroni with which confused, instead you have to carefully unravel the whole tangle.
Spaghetti code is a code where “everything is everywhere” and answer questions such as (a) where a certain part of the functionality is implemented, (b) where an object is created and how it happens and (c) how to determine the critical section for correction . Just calling a couple of simple questions already want to look at the code, which will require an understanding of the entire program. That is, the constant diagnostics of the source code is necessary in order to be able to answer the simplest questions. This is the code that will remain a mystery for those who do not have enough discipline to follow each pasta from beginning to end. This is the spaghetti code.
What makes spaghetti code dangerous is that, unlike other types of bad code, this has become a common by-product of software entropy. If the code has the correct modular structure, but some modules are of poor quality, people will fix the bad code if this is important to them. Bad, erroneous or slow code can be fixed without changing interfaces. To be honest, it is much easier to identify a bug in small, independent functions than in a giant bundle of code that is designed to solve too many problems. Spaghetti code is evil, because (a) it is a very common subtype of bad code, (b) it is almost impossible to fix it without changing the functionality, which would be like crashing if there are people dependent on the old program behavior and (d) for reasons , which I will get to a little later, the emergence of such code cannot be prevented by the typical code verification processes.
The reason I think it's important to separate the concept of “spaghetti code” from the broader concept of “bad code” is because a lot of what makes a code bad is too subjective. A lot of conflicts and rudeness in
collaboration software (or in its absence) are the result of a predominantly male tendency to ridicule unqualified creativity (or its perception, and in terms of code, this is very often biased perception): to beat an alpha status candidate until do not stop pestering us with your incompetent ideas. The problem with this pattern of behavior is that it is useless and rarely makes a person better in what he is trying to do. There are also many disgusting people who define good and bad programmers on the basis of the visual component, so their definition of “good code” comes down to “code that looks the same as written by me”. I feel that the problem of spaghetti code is better defined in scale than the big but too subjective problem of the “bad code”. We will never reach a consensus on the issue of “spaces or tabs”, but we all agree that the spaghetti code is incomprehensible and useless. Moreover, since spaghetti code is the most common and destructive type of bad code, most of the causes and cautions regarding this subtype can be extended to other categories of bad code.
People usually use the concept of “bad code” meaning “disgusting code”, but if it is possible to determine
why they find a piece of code bad and disgusting and find out possible ways to fix it, then this is much better from most of the spaghetti code. Spaghetti code is incomprehensible and often completely unrecoverable. If you know
why you hate a certain piece of code, then this code is already higher in quality than spaghetti code, since the latter is just a faceless nonsense.
What causes spaghetti code? For some time the main reason for the spaghetti code was the
goto operator, but it lost its popularity and still remains in oblivion because it has ceased to be significant. Now the reason is something completely different, namely, modern bydloizatsiya object-oriented programming. Inheritance plays a special role here, and, as a result, not thought-out abstractions: the use of parametrization characteristic of a particular class, implying a single use case, or the addition of unnecessary parameters. I admit that the statement that OOP, as it is now practiced, is a spaghetti code, is not an incontestable point of view. Just as in due time, the harm of using the
goto operator was not considered indisputable.
One of the biggest problems of comparison software (be it a comparison of approaches, techniques, languages, or platforms) is that most comparisons focus on simple examples. Twenty lines of code do not reveal anything sinister, unless these lines are written with intentional vile intent. A twenty-line program written using
goto is generally quite acceptable, and may even be easier than writing without using it. On twenty lines, a set of step-by-step instructions with some explicit control transfer is a very natural way of presenting a program. For static programs (for example, of a platonic form, which will never be changed and will not receive maintenance), which can be read in one go, this structure may simply be different. But already on twenty thousand lines of the program with
goto become more than incomprehensible. Twenty thousand lines of a program with
goto lend itself to such a number of hacks, extensions and optimizations that the original vision of building things is simply lost. And the fact that a program can end up in any part of the code “from anywhere” means that to safely change the code you need to be confidently aware of the number of these “from anywhere”. Everything is everywhere. This not only makes the code difficult to understand, but also means that every modification of the code is likely to make it worse, due to unforeseen consequences. Over time, software becomes “biological.” By this term, I mean that it develops a behavior model in which all components are independent, but some modules may be hidden.
The
goto operator did not take place as a programming language construct, because it gave rise to a lot of problems associated with the constant diagnostics of a program written using it. Less favored, but more narrowly specialized structures, such as procedures, functions, and well-defined data structures, received greater favor. For the only case where people needed global flow control (error handling), exceptions were developed. It was a transition from extreme universality and abstraction of programs written using
goto to the concreteness and specificity of parts (such as procedures) of solving specific problems. In non-structural programming, you can write a Great Program that does a lot of things: it adds new possibilities for every taste and changes the course of things as you need. It should not solve some kind of “problem” (it's so boring ...), but it can be a meta-framework with a built-in interpreter. Structural programming encourages people to break down their programs into specific parts that solve one problem and, if possible, make these parts reusable. This principle became the basis for the Unix systems philosophy (do one thing and do it well) and functional programming (achieve the simplicity of determining exact mathematical semantics, avoiding global states).
Another thing I want to say about the
goto statement is that it is rarely needed as a primitive language level. You can achieve the same effect using the while loop - a counter variable declared outside the loop and used by the switch-case construct, either increases (step), continuing the loop, or is reset (
goto ). This can, if you wish, be expanded into one giant program that will run as one such cycle, but such code is never written. The fact that this is almost never done indicates the fact that the use of
goto is rarely required. Thus, structural programming indicates insanity, to which some descend when trying to control highly non-local flows.
Nevertheless, there was a time when the rejection of
goto was an extremely controversial issue, and all these ideas of structured programming looked like nonsense. The objection sounded like this: why use functions and procedures if the
goto operator is much more powerful?
Similarly, why use
referential transparent functions and immutable records if objects are much more powerful? An object can eventually have a
run or
call or
apply method, so it can be a function. It can also have static or constant fields and be a record. But at the same time, it can do much more: an object can have initializers and finalizers and open recursions and fifty methods if someone makes such a decision. So what's the fuss about this meaningless structured programming, which means that people will create their programs from constructs that are much less powerful, such as writing, whose fields never change and whose classes do not contain initialization magic?
The answer is that the availability of power is not always good.
Power in programming is an advantage for the person who writes the code, and not for the person who then reads it, but the service (for example, the need to understand the code) starts from about 2000 lines or from six weeks, but objectively on a project by more than one developer. On real projects, no one will be engaged only in writing code. Often we have to read both our own code and the code of other people. An unreadable code is simply unacceptable, and is allowed only due to the fact that there is a lot of it, and also because the “best practices” of the PLO adopted in many software companies generate it. A more “powerful” abstraction is more general and, therefore, less specific, which means that it is difficult for those who read such code to determine exactly what it is used for. But people who single-handedly write code often remain fairly straightforward - a powerful abstraction
can have 18 possible uses, but only one of them is actually involved. In this case, there is a kind of individual vision (although usually not documented) that helps to avoid confusion. The danger arises when a person not dedicated to this vision begins to modify the code. Often, these modifications are hacks, which do not explicitly imply another of the remaining 17 uses. This, as a rule, leads to contradictions, and they in turn lead to errors. Unfortunately, the people responsible for correcting these bugs have an even smaller idea of ​​the original vision that lies behind the code, and their modifications add even more hacks. Some patches may occur, but the overall quality of the code is reduced. This is the process of “spaghettizing” code. Nobody just sits down like that and starts writing spaghetti code for himself. This happens by gradually “stretching” the process and almost always several developers are responsible for it. In software, “steep slopes” are really real and the fall can be very sudden.
Object-oriented programming, originally designed to prevent spaghetti code, was (due to the use of “design patterns” without a complete understanding) one of its worst sources. The “object” can easily combine the code and data in it, while having any number of interfaces, while the class can freely spawn subclasses throughout the program. Object-oriented programming harbors great power, and with disciplined use, it can be very effective. But most programmers can't handle it, and over time their code turns into spaghetti.
One of the problems of spaghetti code is that it is formed gradually, which makes it difficult to detect in the code verification process, because every change that leads to “spaghettizing” outside the big picture may look purely positive. The advantage is that the changes the manager or client needed “just yesterday” appear in the code, and on the other hand, it all looks like a moderate amount of additional difficulties. Even in the Dark Times, nobody
went to goto and did not say: “I am going to write a completely incomprehensible program with 40
goto operators pointing to one point of code.” The confusion accumulated gradually, while the development of the program was transferred from one person to another. The same is true for object-oriented spaghetti. There is no concrete transition point from a clean initial design to an incomprehensible spaghetti code. This happens over time, when people abuse the power of the PLO to push incomprehensible hacks, which would not be necessary if everyone understood how the programs they modify work, and if clearer (albeit less powerful) abstractions were used . All this means that the blame for “spaghettizing” lies on everyone and not at the same time: any single developer can declare with confidence that it was not his changes that sent the code straight to hell. That is why large software manufacturers (as opposed to the minimalist philosophy of Unix-systems), as a rule, adhere to the following policy: no one knows who is really to blame for something.
Additional code checks are great for detecting obviously bad practices, such as mixing spaces and tabs, or too long lines. Therefore, the more cosmetic aspects of the “bad code” are less interesting (using the definition of “interesting” as a synonym for “alarming”) than the spaghetti code. We already know how to deal with them with additional code checks. We can even configure our continuous integration servers to reject this code. As for the spaghetti code, which has no clear definition, it is not so easy to do, if not impossible at all. A complete check of all program code is intended to determine it, but I have seen very few companies willing to invest the time and resources necessary for such checks.
In the long term (10 years or more), I think it is almost impossible, with the exception of teams that develop vital or critical software that provides a high level of discipline for an indefinite period.The answer, I think, is that the Big Code simply does not work. Dynamic typing “falls” in large programs, and static typing fails in another way. All this is true for object-oriented programming, imperative programming, and to a lesser, but still noticeable degree for functional programming (manifested in an increase in parameters placed in streams). Problems with “ goto ””Was not, since its nature allowed the code to become the Big Code very quickly. But the cruel reality is that the Big Code is not a “silver bullet”. Large programs simply become incomprehensible. Complexity and large size are not “sometimes not desirable”, on the contrary - they are always dangerous. People like Steve Yegge have long understood this .That's why I think the Unix systems philosophy is inherently correct: programs should not be unclear swampy things that grow in scale and never become complete. The program should solve one problem and do it well. If it becomes large and cumbersome, it must be disassembled into separate parts: libraries and scripts, executable files and data. Ambitious software projects should not have an all-or-nothing structure, as separate programs, because each programming paradigm or set of tools just breaks into this. Instead, such projects should be structured as systems and much attention should be paid to this. This means that you need to pay attention to the resiliency, interchangeability of parts and communication protocols. It requires more discipline,rather than developing a randomly expanding big program, but worth it. In addition to the obvious advantages of a cleaner, more convenient code, it adds what people actually read this code, rather than adding thoughtlessly to hacks, not understanding what it actually does. This means that they are improving as developers over time and the quality of their code becomes better in the long run.Ironically, PLO was originally intended to promote something like minimal software. The original vision of the PLO did not imply that people should sit down and write huge, complex objects, but it meant that they should use the PLO exactlywhen complexity is inevitable. Examples of successful use in this area are databases. People need relational databases in terms of transaction integrity, durability, availability, concurrency and performance, so complexity is a must. The databases are incredibly complex, and I can say with confidence that it took decades for the computer world to achieve their worthy realization, regardless of the huge financial incentives to make it faster. But at the same time, when a database can be complicated (as needed), the interface for using it is much simpler (SQL). You do not specify which search strategy the database should use, you simply write SELECT (describe what the user wants to receive,and not how to get it) and let the query optimizer take care of it.I note that databases are a kind of exception in my dislike of the Big Code. Their complexity is a well-understood need, and there are people willing to devote their careers exclusively to their study. But people do not have to spend their careers on understanding typical business applications. And they will not. They will refuse it, and will transfer the code to other hands, thereby accelerating its spaghettizing.Why Big Code? Why does it exist, despite its pitfalls? And why do programmers so quickly start using OOP tools without asking if their power and complexity are really needed? I think there are several reasons. One of them is laziness: people will give preference to the study of one big general-purpose abstraction, rather than take the time to master highly specific abstractions and situations in which they need to be applied. Why would anyone study linked lists and arrays or all of these incomprehensible structures, such as trees, if we already have an ArrayList? Why know how the program uses reference transparent functions if objects can do the same job (and even much more)? Why learn to use the command line,if modern IDEs can protect you from being able to see the damn thing even once in your life? Why learn new programming languages ​​if Java is already Turing complete? The Big Code arises from the predominance of the following position: why break a large program into modules, if modern compilers can easily cope with hundreds of thousands of lines of code? If computers don't care when they encounter Big Code, why should we care about this?then why should we care about this?then why should we care about this?However, if closer to the point, then I believe that all this is nothing but arrogance with a little bit of greed. The Big Code comes from the belief that a software project will be so popular and successful that people will endure its complexity — ideas in the manner that its own subject-oriented programming language (DSL) will be as huge as C or SQL. It also appears due to the lack of willingness to recognize the problem as solved, and the program is completed even when a significant part of the work has already been completed. It also emerges from the fallacies of what programming really is. Instead of solving existing well-defined problems and getting out of the way, as programs designed for minimalist methodology do, Big Code programs do much more than necessary.Such projects are often inclusive and with impractical “vision”, which implies the creation of software for the sake of software. This introduces confusion, because the “vision” in the corporate environment, as a rule, quickly becomes a policy. Big Code programs are always a reflection of the environment that spawned them (Conway's law ), and they are always more like a collection of parodies and specific humor, rather than the universal language of mathematics and computer science.There is another problem in this play. Managers simply adore the Big Code, because when a programmer-program ratio becomes many-to-one instead of one-to-many, efforts can be tracked and the number of staff can be determined. Minimalistic software methodologies are excellent, but they require the trust of programmers in their ability to distribute their time properly for more than one task, and most tyrannosaurus managers feel uncomfortable in doing so. The Big Code doesn't really work, but it gives managers a sense of control over the distribution of technical effort. He also accompanies the mixing of size and success, which managers often do (as evidenced by the question on the interview for managers “How many subordinates did you have?”).Long-term spaghettification, which is the result of the Big Code, rarely becomes a problem for such managers. They do not see how this happens, and often leave the project before it becomes a problem.Summing up, it is safe to say that the spaghetti code is a bad code, but not all bad code is a spaghetti code. Spaghetti code is a product of industrial programming, which is often (but not always) the result of passing code through a large number of hands, and an inevitable consequence of the methodology for developing large software products and the result of object-oriented programming that comes from defective management processes. The antidote to spaghetti code is aggressive and active refactoring and making efforts to keep the program compact, efficient, with clean source code and, most importantly, consistent.From the translator: The article is quite extensive, so I will be grateful if you inform the PM of any inaccuracies or errors.