Large-Scale C ++ Software Design taxis

Hello, dear readers.

We assume that some of you are already aware of the upcoming reissue of the fundamental work of John Lacos " Large-Scale C ++ ".

')
The previous one-volume edition came out nearly twenty years ago, but on the Internet (and even on Amazon) there are numerous positive reviews, convincingly testifying that even the old edition remained relevant for more than a dozen years. Below we offer the translation of one such article, which sets out general guidelines for the physical design of large C ++ projects. We hope that the article will interest you, and we will be able to please you with the translation of the new edition next year.

One of the most interesting books about C ++ programming I’ve read is John Lacos Large Large-Scale C ++ Software Design. It was published in 1996 and, unfortunately, it remains the only book on physical design in C ++ and on scaling such projects to service large systems.
How could the principles outlined in it not become obsolete in more than 10 years? Below - my brief comments on the most important tips of the author - I checked all these tips on real projects:

We will understand the terminology

Physical design is the work with the physical entities of a software system (files, directories, libraries).
An announcement is the first mention of a name in a program; the definition gives a unique description of the entity (for example, type, instance, function) in the program
A name has an internal binding if it is used locally within its translation unit - that is, if at the time of linking there is no conflict between it and the identical name defined within another translation unit.
A name has an external linkage if, in a multi-file program, this name may interact with other translation units during linking.
A component is the smallest element of physical design. As a rule, a component is a header file + source file.
A package is a group of components.
A subsystem is a group of packages.

Recommendations

Class information must be private.

This is one of the key recommendations for both object-oriented design and physical design. The idea is good, because in this way the complexity of the component is partially hidden.
If you simply declare instance variables as private, this will in no way affect the physical design, but if you take another step using the compilation firewall (PIMPL / Cheshire cat), you can reduce the number of compile-time dependencies and, accordingly, speed up the compilation itself.

Verdict: taxis

Avoid data with external binding in the file scope.

It is very easy to do (just add “static”), while helping to avoid errors during linking and linker bugs. For example: I had a problem with two functions that had external linking, while having the same name and having such parameters that could be transformed into each other. The wrong function was called at runtime, but there were no warnings at compile time.
The only catch here is that most C ++ compilers do not allow characters to be internal, including in an anonymous namespace, even though this method is recommended as the standard one, and the static method is officially declared undesirable.

Verdict: taxis

Avoid free functions (apart from operator functions) in the scope of .h files; Avoid externally-bound free functions (even if they are operator functions) in .cpp files.

In essence, it’s about avoiding name conflicts and insuring yourself against strange interactions between translation units. I always do that.

Verdict: taxis

Avoid enums, type definitions, and constants in the scope of .h files.

Same idea as above. Enumerations are particularly cunning, because the name of the enumeration is not a namespace, and the value of each enum is published in the global namespace.

Verdict: taxis

Try not to use preprocessor macros in header files, except as a connection security.

Macros in header files can provoke bugs that are very difficult to track. A classic example is a macro with an unsuccessfully chosen name that can accidentally change the code that includes it, or even work even less obvious: once my former colleague had a problem with information corruption in memory when a program crashed in the debugger when a certain object was deleted . But the most interesting thing is that this did not happen with all removal operations, but only with some. It seemed that the program always fails when it is removed from the A package, and it continues to work when it is removed from the B!

So as not to torment you, I tell you: it is # ifdef-al some variables of instances of this object. Package A collapsed when it was deleted because it received objects of size X from package B and tried to remove them with size X - sizeof (# ifdef-ny instance variables).

Verdict: taxis

Only classes, structures, unions, and free operator functions should be declared in the scope of the file in the .h file; only classes, structures, unions, and inline functions (instance functions or free operator functions) must be defined in the scope of the file in the .h file.

This rule follows from the previous ones. The idea is that classes, structures, and unions form a kind of namespace when they are declared, and this helps keep name conflicts to a minimum. Operator functions are not required to declare and define in the file scope, but some operator functions cannot be turned into function instances, so there is no choice.

Verdict: taxis

Apply a unique and predictable (internal) connection protection that should cover the contents of each header file.

In fact, in most projects (even small ones) this is necessary, since when you include header files multiple times, you will receive a compilation error. It should be noted that connection protection should have a single predictable name; I have worked on projects where many naming conventions have been applied, and this has been a lot of confusion.

I like to use options derived from the file name, for example, INC_FILENAME_H. For this, I wrote a small macro in my IDE that generates connection protection for the selected text.

Verdict: taxis

Logical entities declared inside the component should not be defined outside this component.
I have never encountered a situation in which this rule would be violated, but it probably happens in practice, otherwise why would the author mention it? Perhaps C ++ is one of the few languages where this practice can get away with you. However, I can't imagine why I would do it ...

Verdict: self-evident

Each component .c file must include its own .h file in the first significant line of code.

The meaning of this rule is to prevent the successful compilation of incomplete header files (in which there are no inclusions). If such a file is included first, it will definitely not compile, and if it is included after other header files, then they may contain files that are missing in our header, then the compilation will be successful.

This rule goes against the use of precompiled headers, which usually should be included as the first file (eg MSVC stdafx.h). I do this: first I include the precompiled header, then the component header, then the project headers, external library headers (eg: boost, wxWidgets, etc.) and, finally, the STL / CRT headers. In addition, I explicitly include the files that make up the precompiled header, because the compilers are smart enough to skip them, and if necessary, I can compile without precompiled headers.

Verdict: taxis

Avoid externally binding definitions in a .c file unless they are explicitly declared in the corresponding .h file.
Avoid referring to a component with an external binding in another component through a local declaration; in this case, include the .h file for this component.

These recommendations are interrelated. If you import the names correctly, then you will have only one point for changes, and if you make an error in changing these names, you will get a compilation error. This can and does happen with implicit import operations that can be imperceptibly damaged and provoke difficult-to-trace bugs.

Verdict: taxis

Each global identifier must be accompanied by a prefix of its package.
The name of each source file must be accompanied by the corresponding package prefix.

Excessive caution, do not you think? However, do not forget the book is about VERY BIG programs. If there are thousands of files in your project, then you probably will not be able to keep in mind the names of all the files, and any reference points will be very useful for you.

This practice was especially useful for me in the following cases:

When analyzing the code on the printouts (or in a very bad IDE / editor)
When filtering packages when trying to find the file path
When filtering packets when trying to find the path to the identifier

Verdict: taxis

Avoid circular dependencies between packages.

Cyclic dependencies - this is not good, everyone agrees? In large projects, it is very important to manage dependencies, because it is worthwhile to miss something - and you will have a monolith.

Here are some problems that may arise if you have cyclic dependencies between packages:

You will not be able to test the packages individually, which in turn interferes with automated unit testing.
Compile time increases
Changes filter through the entire source code of a program.

Verdict: incredibly taxing

Provide a mechanism for freeing all dynamic memory allocated for static constructions inside the component.

It will be necessary to free such memory, in particular, because it is much easier to track leaks in memory. Most memory checking tools work like this: instant snapshots of the program’s memory status are taken at different points in time. If there is unallocated memory allocated for static constructions, it will be qualified as a leak.

Moreover, if the memory is used by an object, the destructor will not be called until the programmer explicitly frees up the memory, even if the OS can free up memory when exiting the program.
The easiest way to solve this problem is with a smart pointer, for example, auto_ptr or shared_ptr.

Verdict: taxis

findings

As you can see, most of the recommendations are still relevant. The book of Lacos largely determined my understanding of large-scale C ++ programming; these recommendations were useful to me on several real-world projects.

Source: https://habr.com/ru/post/301978/

All Articles

Large-Scale C ++ Software Design taxis

More articles: