Write code that is easy to remove and not add

"Every line of code is born without a reason, it continues in weakness and is deleted accidentally," Jean-Paul Sartre programs in ANSI C.

Each new line of code brings with it costs in the form of the need to support it. To avoid such costs of working with a large amount of code, we resort to its reuse. The disadvantage of using this method is that it begins to interfere with us if we want to change something in the future.

The more users your API has, the more code you have to rewrite to introduce new changes. The opposite is also true: the more you rely on a third-party API, the more problems you experience when it changes. Organizing the interactions and interconnections of different parts of the code is a serious problem in large systems. And as the project progresses, the scale of this problem grows.
')
The translation of the article into Russian was prepared by PayOnline , the provider of payment solutions for your online business.

I say that if we so want to count the number of lines of code, we should look at them not as “lines produced”, but as “lines spent” - E. Dijkstra, manuscript 1036 .

If we treat the “lines of code” as “spent”, then, removing them, we reduce the cost of support. Instead of creating reusable programs, we should strive to create disposable programs. I think it’s not necessary to explain to you that deleting a code is much more fun than writing it.

To write code that is easy to delete, try to avoid dependencies in general and, as often as possible, to abandon their ordering. Break your code into levels: write easy-to-use APIs, creating them on the basis of more simple to use, but generally less convenient solutions separately. Separate the code, isolating the most complex and most volatile parts from the rest of the program code and from each other. Do not make rigid definitions for all possible cases: in some situations it is better to leave the option of choice while the program is running. Do not try to do all this at the same time and think about whether you should write so much code at all.

Step 0: Do not write code

The number of lines of code in itself says little, but the effect of 50, 500, 5 000, 10 000, 25 000 lines, etc., differs significantly. A monolith of a million lines will spoil your nerves more than a structure of 10 thousand lines. If you talk about time, money and effort that you have to spend to replace it, here the difference will be felt much stronger.
The more code you have, the harder it is to get rid of it. However, saving one line of code does not produce any results by itself. Anyway, the easiest way to remove such code is that you managed to refuse before you even started writing it.

Step 1: Use Copy-Paste

Writing a “reusable” code is much easier with hindsight, having a few examples of use, rather than trying to assess which ones might be needed in the future. However, the positive aspects of this method you feel on yourself, already just working with the file system. Therefore, reuse of the code seems to be all right: a small redundancy will only benefit the program.

Copy-paste is sometimes better to use than to create a library function, just to understand how this function will behave in your case. That is, you should carefully consider whether you need to write a function instead of copy-paste at the moment, because as soon as you turn your achievements into a public API, the process of their subsequent change becomes more difficult.

Always remember that the function you write will be called both for its intended purpose and for other things that you did not even think about at the time of its creation. Programmers using it will rely on their own observations, and not on what you wrote in the documentation. And, of course, it will be easier to remove the contents of the function than the function itself.

Step 2: Do not use copy-paste

As soon as you notice that some part of the program has been copied enough times, it may be time to write a function based on it. In this step, I’m talking about simplifying the simplest things, such as opening a configuration file and showing a hash table or deleting a directory. Functions that do not have any state or contain little global information, such as environment variables, in general, everything that appears in the file with the word “util” in the name, also fall into this category.

A small digression: create a special directory for util and save each utility to a separate file. Using a single util-file will lead to the fact that it will eventually grow to an enormous size and then it will be very difficult to divide it into separate parts. Always remember that maintaining a single util file is a bad practice.

The less specific a code is for your application or project, the easier it will be to reuse it, and the less chance it will be changed or deleted. This is usually a library code that describes a data record, working with third-party APIs, file handles or processes. More examples that you won't need to delete are lists, hash tables, and other data sets. Not because of the frequent simplicity of their interfaces, but because they will not grow in terms of scope over time.

Do not try to specifically facilitate the task of removing the code. Instead, at this stage, we must try to keep difficult to remove parts of the program as far as possible from parts that are easy to remove.

Step 3: Write more boilerplate code

Libraries are written to avoid constant copy-paste. Nevertheless, it often happens that in the process of writing them we add even more fuel and paste, but we call it differently: boilerplate. Creating boilerplates is a lot like copy-and-pasteing, with the difference that in it you change a different part of the code every time, not the same one. As in the case of copy-paste, you duplicate parts of the code to avoid introducing dependencies and achieve flexibility, in return for even greater redundancy.

Libraries that need boilerplates are often developments like network protocols, data transfer formats, parsing tools, and in general all those code bases that make it difficult to combine policies (what the program should do) with the protocol (what it can do) without overlapping any restrictions. Such code is difficult to delete: as a rule, it is required to communicate with other computers or to process various files. At the same time polluting it with business logic is the last thing we want. No, this is not about any additional exercise in reusing code. We just try to keep all of the frequent changes to code blocks away from relatively static ones. That is, we are engaged in minimizing library code dependencies, even though we have to write boilerplate for it. The result is that you write more lines of code, but they all fall into parts that are easily removable.

Step 4: Do not write the boilerplate code

Boilerplates work best when libraries are supposed to meet the very different tastes of developers, but sometimes this approach leads to excessive redundancy. Then it is time to put your flexible library inside another one, which has its own views on rules, schemes and states. Creating an easy-to-use API is to turn your boilerplate into a library. And this is not as rare as you might think. An example is one of the most popular and favorite http clients for Python, requests, which successfully cope with the provision of a simple interface, working on the basis of the more redundant urllib3 library. Requests allows you to use many typical http work schemes, hiding most of the details from the user's eyes. At the same time, urllib3 performs pipelining, connection management and hides nothing from the user.

The point here is not so much to hide some details, putting one library inside another, but rather in the division of responsibility: requests can be compared with a travel agency, which gives you a choice of vouchers for popular http travel in the world, while urllib3 is needed to make sure that you have everything you need to make this journey as it should.

No, I do not urge you to immediately go and create directories / protocol / and / policy /. However, it may become a necessity, because you probably want to keep your util-directory free from any business logic and at the same time continue to work on creating your own library tandem. You can easily work on them in parallel, without waiting for the work on the base library to be completed.

It is often useful to also make “wrappers” for third-party libraries, even if they are implemented as a protocol. You can create a library that is appropriate for your code, instead of using solutions that are common to the entire project. It often happens that the API you create cannot be both enjoyable and well extensible. These two concepts are contrary to each other.

The division of responsibility allows us to please some users, without blocking oxygen to others. Division into levels is easiest to do when you initially have a good API, however, writing a good API over a bad one will hardly be to your taste. Good APIs are designed according to how users will see them (that is, programmers), and creating a hierarchy in this sense means understanding that you cannot please everyone at the same time.

The point of splitting code into levels is not so much to write code that you can later delete, but to make hard-to-remove code pleasant to use (do not pollute it with business logic).

Step 5: Write a large code block.

No matter how much you copy-paste, refactor, divide into levels, or design, it all comes down to the fact that the code has to do some work. Sometimes, if everything goes wrong as intended, the best thing is to just give up and write a good amount of low-quality code, just to make everything else work.

Business logic is an endless series of borderline cases, as well as quick and dirty tricks. And that's fine. It suits me. Other styles like "game code" and " founder code " are the same: trying to cut off on bends to save a significant amount of time.

Why I suggest just take and write a lot of code? Because getting rid of one big mistake is much easier than trying to remove 18 small ones, closely intertwined with each other. Programming is generally largely due to research. Making a few mistakes and getting results is a quicker way than trying to think things through the first time. This is especially true in cases of fun or creative endeavors. If you are writing your first game, do not start this process with the engine. Similarly, do not write a web framework across the application. I say this because I know that you still get a mess, which no mentally healthy person can figure out. Therefore, better sit down and write at first exactly this mess.

Monorepositions represent the same trade-off: you will not know in advance how to divide the code. Well, to deploy such a “one big mistake” is easier than 20 closely related ones. When you know what part of the code you will need to drop soon, and which part to remove or easily replace, you can cut off a lot more corners. This happens when you do orders for websites and web pages dedicated to one-time events, or any similar work where you have a ready-made template and all you have to do is to stamp copies or simply fill in the gaps left by the developers of the framework.

No, I do not suggest you write the same nonsense ten times, trying to correct all her mistakes. I'm talking about something else. As Alan Perlis once said: "Everything must be created from the top down, except for the first time." Do not be afraid to make new mistakes, take new risks and, slowly but surely, move forward with the help of an iteration.

To become a professional software developer means to collect a whole catalog of regrets and mistakes. Success teaches nothing. You cannot know in advance what a good code looks like, but the scars left from a bad code are always fresh in your mind. In any case, projects, in the end, either fail, or become inherited code. Failures happen more often than successes. It will be faster to make ten different clumps of dirt and see what happens, rather than trying to shine up one pile of shit. To remove the code entirely is easier than doing it in parts.

Step 6: Split the code

Large clumps of dirt are easy to sculpt, but to maintain is the hardest. An attempt to make a simple, at first glance, change in them ends with the introduction of corrections to almost all parts of the code base.

So, we have created in our code a hierarchy for sharing responsibility for both platform and domain tasks, and now we need to find a way to divide the logic that lies on top of all this.

“Start with a list of the most complex design decisions, or those that are most likely to change. Next, design each module to hide such a solution from other modules. ” - David Parnas.

Instead of breaking the code into parts with similar functionality, we divide it into parts, based on how they differ from each other. We identify those that cause the greatest difficulty in writing, maintaining or deleting. We do not create modules based on whether we can reuse them, as long as it is convenient to change them in the future.

Unfortunately, some problems are related to each other more closely, and it can be more difficult to separate them from others. Despite the principle of one responsibility, which states that “each module should solve only one complex problem”, in fact it is much more important that “only one module deal with the solution of each complex problem”. In cases where a module deals with two things at once, this happens because changing one part requires changing the other. Working with one awful, but simple in terms of interface component is often easier than with two components that require careful coordination with each other.

“I’m not going to try to define more precisely the material that falls under this short description [“ weak connectivity ”]; I may never be able to give this a clear definition. However, I know when I see , and the code base that is being considered in this case is not that. ” - US Supreme Court Justice Stewart.

A system in which you can delete certain parts of it without having to rewrite others is often called loosely coupled, but it is much easier to explain how it looks in fact than to know in advance how to create it. Loose coupling allows even hard assignment of a variable or use of a command line flag on top of it. The meaning of this technique is to be able to change the basic solutions without the need to redo all the code.

In Microsoft Windows products, for example, external and internal APIs are used for this purpose. External APIs are tied to the life cycle of desktop programs, and internal ones are tied to the kernel on which they work. Hiding these APIs gives Microsoft the flexibility to make changes to the system without any danger of breaking a bunch of programs as a result of this activity.

HTTP also contains examples of weak connectivity: adding a cache in front of an HTTP server, moving images to a CDN, which only changes links to them. Neither mechanism breaks your browser. Another example of loose coupling is the error codes used in HTTP. Common server problems throughout the web have their unique identifier. When you get a 400 error, you know that performing the same operation that led to it will not change the situation. But in the case of the 500 error, reloading the page can change everything. HTTP clients can handle many errors, eliminating the need for programmers to do it themselves.

Consider how your software will handle errors when you decompose it into smaller pieces. And, of course, this is also easier to talk about than to do.

“I decided, albeit with great reluctance, to use LATEX.” - Joey Armstrong. Creation of distributed systems that work reliably in the presence of software errors. 2003

Erlang / OTP is very unique in terms of the error handling method called “control trees”. Speaking in general terms, each process in the Erlang systems is started and monitored by the supervisor. When a process encounters a problem, it stops its work, after which it is immediately restarted by the supervisor. As for the supervisors themselves, they are started by the initial process, which also restarts them when they already fail.

The key idea is that the “error-restart” work is faster compared to attempts to handle errors. This kind of handling failures, when reliability is achieved by refusing to solve a problem, may seem counterintuitive, but in practice, the shutdown and restart method is very effective in suppressing one-off and transient failures.

Error handling and recovery is best done at the outer levels of your codebase. In another way, this is called the principle of interaction of two extremities. It says that it is easier to handle errors at the two far ends of the binding medium, rather than anywhere in the middle. This is due to the fact that even if the work on the error occurs somewhere in the middle, you, one way or another, will have to do a check at the border levels. If each top level still needs to handle errors, then why do it also somewhere inside the program?

Error handling is one of the many ways in which a system can be closely linked internally. There are many other examples of close connection, and yet it would be dishonest to single out any one as bad. Except IMAP.

In IMAP, almost every operation, like a snowflake, has unique parameters and a reversal algorithm. Error handling becomes a very unpleasant process: errors can appear right in the middle of performing another operation.

Instead of UUIDs, IMAP generates a unique token to identify each message. The latter can also change directly during the execution of another operation. Many operations can be divided into parts. It took 25 years to invent a way to reliably move emails from one folder to another. And, of course, it is impossible not to note the use of very specific UTF-7 and base64 encodings in it. No, I do not invent anything.

For comparison: both the file system and the database are much better examples of remote storage. The file system offers a fixed set of operations, but the set of objects on which you can produce them is large and very diverse. It may seem that the SQL interface is more advanced than the file system capabilities. However, it uses the same workflow: there are a number of operations for working with a dataset and a huge number of rows on which these operations are performed. And although you cannot always make the replacement of one database with another, it is much easier to find solutions that would work with SQL than similar solutions for any artisanal query language.

Other examples of weak connectivity include systems that use middleware or filters and pipelines. Finagle Twitter uses the usual API for services, and this allows you to add basic processing timeouts, re-connection and authentication mechanisms without any extra effort. And, of course, I cannot fail to mention the UNIX pipeline in connection with this. This would cause great indignation.

So, at first we divided our code into levels, but now some of these levels together use one interface: a certain common set of behaviors and operations suitable for a wide variety of applications. Homogeneous interfaces are often good examples of loose coupling.

The correct code base does not have to be perfectly divided into modules. Just modularity makes the code writing process much more interesting. These are like Lego parts, which are interesting to play with, because they fit together. A healthy code base always has a small excess of functionality, as well as a distance between moving parts, so that your hands are not stuck in them.

A loosely coupled code is not necessarily easy to remove, but replacing it or making changes to it is always much easier.

Step 7: Keep writing code

The ability to write code without thinking about previously written lines seriously simplifies the process of experimenting with new ideas. No, I'm not saying that you now always have to try to write microservices instead of monoliths, but your system should allow you to conduct one or two experiments on top of your main work.

Feature flags - one of the ways to change earlier decisions. Although feature flags are perceived by many as a way of experimenting with new features, they also allow you to add changes without redeploying a new version.

Google Chrome is a terrific example of the positive things they carry. Chrome developers realized that the most difficult moment of supporting a regular release cycle was a big waste of time to merge existing feature-branches for a long time.

The ability to enable or disable a new code at any time without recompiling it allows you to break up major changes into smaller merges without damaging the existing core code. In addition, thanks to the advance appearance of new features in the same code base, the team was able to anticipate situations where the development of a long-lived feature will affect other parts of the code.

Feature flag - not an easy command line option. This is a way to separate feature releases from merging branches or from the main code. The ability to change your decision while the program is running is becoming increasingly important in conditions where the release of new software can take hours, days or weeks. Ask any major fault tolerance engineer, and he will tell you that if the system wakes you up in the middle of the night, then it should definitely include changes in the course of work.

The talk in this step is not so much about iterations as such, but about the need to have a feedback loop. Not so much about writing reusable modules, as about the separation of components to make changes to them. It is important to remember that the introduction of changes to the main code is not only creating new features, but also removing old ones. Writing extensible code is like hoping that in three months everything will be fine with your project. Writing code that you can remove is a job based on the opposite assumption.

The strategies I mentioned above - division into levels, isolation, common interfaces, composition - are designed to help you not to write good software, but to create software that can change over time.

“The management issue, therefore, is not whether to create a pilot system and throw it away. You will do it anyway. ... Therefore, plan to throw it away from the very beginning; anyway, it will turn out well. " - Fred Brooks.

Of course, this does not mean that you have to throw out absolutely everything, but you will have to remove some of it. Writing good code doesn’t mean doing everything right the first time. A good code is simply an inherited code that is not confused at your fingertips. And good code is easy to remove.

Source: https://habr.com/ru/post/277629/

All Articles