
The theme of ideal code often causes controversy among experienced programmers. The more interesting it was to get the opinion of Igor Marnat, director of development at Parallels RAS. Under the cut his author's view on the stated topic. Enjoy!

')
As an introduction, I would like to dwell on the question of why I decided to write this short article. Before writing it, I asked the question from the title of several developers. With most of the guys had to work for more than five years, with some a little less, but I trust their professionalism and experience unconditionally. All industrial development experience for more than ten years, all work in Russian and international companies, software manufacturers.
Some colleagues found it difficult to answer (some people still think), others gave one or two examples at once. To those who gave examples, I asked a clarifying question - “What, in fact, caused this admiration?”. The answers corresponded to the results of the next stage of my small research. I searched the web for answers to this question in various formulations close to the title of the article. All articles answered in much the same way as my comrades answered.
The developers' answers, as well as the wording of the found articles, related to the readability and structure of the code, the elegance of logical structures, the use of all the features of modern programming languages ​​and the following of a certain style of design.
When I asked myself about the “divine code”, the answer came up immediately, from the subconscious. I immediately thought of two code examples I worked with for a long time (more than ten years ago), but I still feel a sense of admiration and some awe. Having considered the reasons for admiring each of them, I formulated several criteria, which will be discussed below. I will dwell on the first example in passing, but I would like to take a closer look at the second one. By the way, to varying degrees, all these criteria are discussed in the reference book of each developer “
Perfect Code ” by Steve McConnell, but this article is noticeably shorter.
90's example
The first example I will mention is related to the implementation of the v42bis modem protocol. This protocol was developed in the late 80s - early 90s. An interesting idea embodied by the developers of the protocol is the implementation of stream compression of information during transmission over an unstable (telephone) communication line. The difference between stream compression and file compression is fundamental. When compressing files, the archiver has the ability to analyze the entire data set, determine the optimal approach to compressing and encoding the data, and write the data to the file as a whole, without worrying about possible data and metadata losses. When unzipping, in turn, the data set is again fully accessible, integrity is provided with a checksum. With on-line compression, only a small data window is available to the archiver, there is no guarantee that there is no data loss, the need to reset the connection and initialize the compression process is common.
The authors of the algorithm have found an elegant solution, a description that takes
literally several pages . Many years have passed, but I am still impressed by the beauty and elegance of the approach proposed by the developers of the algorithm.
This example does not relate to the code per se, but rather to the algorithm, so we will not dwell on it in more detail.
Linux is the head of everything!
I would like to analyze the second example of a perfect code in more detail. This is the Linux kernel code. The code that at the time of this writing controls the operation of 500 supercomputers from the
top 500 , the code that runs on every second phone in the world and that controls most of the servers on the Internet.
Consider for example the memory.c file from
the Linux kernel , which belongs to the memory management subsystem.
1. Sources are easy to read. They are written using a very simple style that is easy to follow and difficult to get confused. Capital letters are used only for preprocessor directives and macros, everything else is written in small letters, words in names are separated by underscores. This is probably the easiest coding style possible, except for the lack of style at all. At the same time, the code is perfectly readable. Indents and approach to commenting are visible from any piece of any kernel file, for example:
static void tlb_remove_table_one(void *table) { smp_call_function(tlb_remove_table_smp_sync, NULL, 1); __tlb_remove_table(table); }
2. There are not too many comments in the code, but those that exist are usually useful. They, as a rule, do not describe the action, which is already obvious from the code (a classic example of a useless comment - “cnt ++; // increment counter”), but the context of this action - why what is done here, why it is done so, why here, with what assumptions it is used, with what other places in the code it is connected. For example:
void tlb_gather_mmu(struct mmu_gather *tlb, struct mm_struct *mm, unsigned long start, unsigned long end)
Another use of comments in the kernel is to describe the change history, usually at the beginning of the file. The history of the nucleus has been around for almost thirty years, and some places to read are just interesting, you feel like a part of the story:
3. The kernel code uses special macros to validate data. They are also used to check the context in which the code works. The functionality of these macros is similar to the standard assert, with the difference that the developer can override the action that is performed when the condition is true. General approach to data processing in the kernel - everything that comes from the user space is checked, in case of erroneous data the corresponding value is returned. In this case, WARN_ON can be used to issue a record in the kernel log. BUG_ON is usually quite useful when debugging new code and launching the kernel on new architectures.
The BUG_ON macro usually causes the contents of the registers and the stack to be printed and either stops the entire system or the process in the context of which the corresponding call occurred. The WARN_ON macro simply displays a message to the kernel log in the event that the condition is true. There are also macros WARN_ON_ONCE and a number of others, the functionality of which is clear from the name.
void unmap_page_range(struct mmu_gather *tlb, …. unsigned long next; BUG_ON(addr >= end); tlb_start_vma(tlb, vma); int apply_to_page_range(struct mm_struct *mm, unsigned long addr, … unsigned long end = addr + size; int err; if (WARN_ON(addr >= end)) return -EINVAL;
The approach, in which data obtained from unreliable sources are checked before use, and the system’s response to “impossible” situations is foreseen and defined, makes it much easier to debug the system and its operation. You can consider this approach as the implementation of the fail early and loudly principle.
4. All core components of the kernel provide users with information about their state through a simple interface, the virtual file system / proc /.For example, information about the state of memory is available in the file / proc / meminfo
user@parallels-vm:/home/user$ cat /proc/meminfo MemTotal: 2041480 kB MemFree: 65508 kB MemAvailable: 187600 kB Buffers: 14040 kB Cached: 246260 kB SwapCached: 19688 kB Active: 1348656 kB Inactive: 477244 kB Active(anon): 1201124 kB Inactive(anon): 387600 kB Active(file): 147532 kB Inactive(file): 89644 kB ….
The information above is collected and processed in several source files of the memory management subsystem. So, the first MemTotal field is the value of the totalram field of the sysinfo structure, which is populated with the
si_meminfo function of the
page_alloc.c file .
Obviously, the organization of collecting, storing and providing the user with access to such information requires efforts from the developer and some overhead from the system. At the same time, the benefits of having convenient and simple access to such data are invaluable, both in the process of development and in the operation of the code.
The development of almost any system should start with a system for collecting and providing information about the internal state of your code and data. This will greatly help in the process of development and testing, and, further, in operation.
As
Linus said , “Bad programmers worry about the code. Good programmers worry about data structures and their relationships. ”
5. All code is read and discussed by several developers before committing. The source code change history is recorded and available. Changes to any line can be traced back to its occurrence - what has changed, by whom, when, why, what issues were discussed by the developers. For example, a change in https://github.com/torvalds/linux/commit/1b2de5d039c883c9d44ae5b2b6eca4ff9bd82dac#diff-983ac52fa16631c1e1dfa28fc593d2ef in the code memory.c, inspired by the https://bjcpage filed, which is supported by the current accountant, and you will be supported by https, in a modem, and you will be supported by the current accountant, and you will be supported by the httpspage filed by 78h.h. a small code optimization was made (the call to enable memory protection from writing does not occur if the memory is already write-protected).
It is always important for the developer working with the code to understand the context around this code, with what assumptions the code was created, what and when it changed, in order to understand which scenarios could be affected by the changes that he himself is going to make.
6. All important elements of the life cycle of the kernel code are documented and available , starting
with the coding style and ending with the
content and schedule for the release of stable kernel versions . Each developer and user who wants to work with the kernel code in one capacity or another has all the necessary information for this.
These moments seemed important to me, basically, they determined my enthusiastic attitude to the kernel code. Obviously, the list is very short and can be expanded. But the above points, in my opinion, relate to key aspects of the life cycle of any source code from the point of view of the developer working with this code.
What I would like to say in conclusion. Kernel developers are smart and experienced, they have been successful. Proved by billions of devices running Linux
Be like kernel developers, use best practices and read Code Complete!
ZY By the way, what are the criteria for the ideal code for you personally? Share your thoughts in the comments.