📜 ⬆️ ⬇️

About modularity, good architecture, dependency injection in C / C ++ and multicolored circles

Not in the aggregate look for unity, but more - in the uniformity of separation.
Kozma bars

Some water at first


It is impossible not to notice that aspect-oriented programming takes new frontiers of popularity every year. On Habré there were already several articles devoted to this issue, from Java to PHP. It's time to look at C / C ++. Now, in the first paragraph, I confess that it will not be about “real aspects”, but about something closely connected with them. Also, the reasoning will be conducted in the context of embedded-projects, although the described methods can be applied anywhere, but it is embedded, this is the area where the effect will be most noticeable. I will also use the words "Header" and "Define" to denote, respectively, the "header file" and "macro". A dry and academic language is good, but in this case, it seems to me that everything will be easier to understand if you use well-established Anglicism.

One of the reasons for the existence of the concept of "software architecture" is the need to manage complexity. In this field, so far nothing better than the component model has been invented. Even the standards for describing the software architecture from IEEE are based on the principle of breaking a project into components. Components are good, weak connectivity, reusability and, ultimately, good architecture.
In turn, programming (in a broad sense) is largely an attempt to structure and describe the virtual world in the form of a textual description. And everything would be fine, but the world (even the virtual one) is somewhat more complex than the set of interacting objects / components. There is a functionality called "pass-through", which is not contained in any one place, but "spread" on other modules. Examples include both textbook: logging, tracing, error handling, and “non-trivial”, such as security, virtual memory support, or multiprocessing. In all these cases, in addition to the main module (in the case of logging, this may be the function itself that outputs the log) there is still some kind of “something” that needs to be added to other modules. And these modules do not know about it. This is something called aspects. The issue of integrating aspects into modules that are not aware of these aspects is actually aspect-oriented programming (AOP).
The concept close to AOP is dependency injection (OT). Although they are often opposed to each other, in fact there is more in common between them than differences. The main difference is that AOP assumes that the target module knows nothing at all about what and how it can be implemented. This method has flaws that I will not list here, so as not to delve into the philosophy of programming as a whole. I note only that AOP cannot solve the problem with “non-trivial aspects” when the code needs to be embedded in some special points depending on the algorithm (aspects are linked to syntax structures like function calls, returns from functions, etc.). This is where dependency injection comes into play. What is good OT? First of all, the fact that the interface of the implemented aspect and module becomes definite and not bound to the syntax and function names in the target module. This opens the way for the creation of “source-level modules”, which I will discuss in this article (modularity itself is a topic for another big conversation). What is common between OT and AOP? At least the fact that the OT as well as AOP allows you to implement some third-party code (aspects), the target code does not know about all possible variants of which. The principal difference is that in classical AOP, the implementation points are described in the code of the aspect itself, and in the case of the OT, the code is embedded in the calls of the interface methods already in the target code.
Java / C # adepts will reasonably argue that OT is just a design pattern that is implemented in any language, just instead of creating an object and calling its methods, you need to call a factory method that analyzes some external configuration and creates what you need. But again, since we are in the context of embedded, there should be no unreasonable indirection and dynamic configuration, since this has the most direct impact on performance. Everything should, if possible, occur statically.
Thus, this is a post about how (and why) you can make static dependency injection for C / C ++ projects, what problems it can solve, and how you can make your life easier with it.

The specifics of large embedded-projects


Projects for embedded have such specificity that, willy-nilly, they have to work in a large number of different environments and configurations (I, at one time, worked with a code base that was compiled by three compilers under 6 different operating systems, and various variations of GCC, which were where 3-4, I count for one compiler). Different boards had different logic, different drivers, etc. A natural way to put some kind of order in this is to introduce a configuration header containing a set of defines (# define) that would enable or disable some kind of functionality. For each possible configuration, such a file would be entered, and after compiling all the project files, what was needed would be collected.
No need to explain that as the project grows and the number of modules in it grows, the size of this configuration file grows, as well as the number of implicit links between components. I mean that the author should keep in mind the structure of the project, and the source code, which by its meaning contains some information about dependencies, does not help him in this way. In addition, since for each configuration only a subset of files is compiled, it is rather tedious to compile all files, excluding the contents of unnecessary files using the same defaults. Alternatively, you can customize the build system so that only the necessary files are collected, but in this case you need to manually maintain the consistency of the build system and source files configuration files.
Another case from personal experience, somehow it took a part of the project to give to the customer in the form of a library. But with the library, it was necessary to install headers, and with their entire internal folder structure. Not only did the actual time of identifying all the necessary headers take a decent amount of time, the customer needed to copy our folder structure of the headers and somehow integrate into his project, which also took time.
')

Why C / C ++ projects are difficult to configure


Talk about modularity in C / C ++ is conducted as much as these languages ​​exist. Quite often something like a software configuration management engineer flashes in the job listings, and although, at first glance, it seems that this is something like a “barrier director”, in fact the problem is more than urgent. Although the problem exists not only in C / C ++, but it is there that is especially important because of the active use of these languages ​​in projects that are critical to performance, and because of which the use of dynamic configuration methods is often unacceptable. Initially it was assumed that modularity in C / C ++ should exist only at the library level. To verify this, just look at the source structure of the vast majority of projects in C / C ++ (especially C). Among the subfolders of the “modules” is the include folder, which contains the leaders of the entire project, which is accessible to all modules. This is the “reference path”. The entire project, therefore, is a monolith configured by a macro processor and an assembly system. Adding a new module entails adding its drivers to a common folder, prescribing new #include directives, changing the configuration of the assembly system and many other trivialities, after which the module ceases to be a module and becomes an integral part of the project. A module cannot be compiled separately from a project, and a project cannot be assembled without a module. Orientation to the global folder with headers also makes it difficult to reuse modules in other projects, simply by copying files you can get compilation errors at best (at worst files will simply be in the project tree and will not be compiled). An additional problem is the need to manually maintain consistency of file dependencies on each other, as well as determining the correct list of include directories. At the same time, it is obvious that if a certain header file containing function prototypes is included in the project, the functions (implementations) themselves should also be included in the assembly.
There are not so many configuration possibilities, basically three methods are used, so-called “time-tested”. The first is # ifdef- # elif- # endif. Each point in the code, whose behavior may vary depending on the configuration, is framed with # ifdef and begins to depend on the definitions from the general "configuration" header. Problems begin when you need to add another #elif option. If such #ifdefs are evenly distributed over hundreds of files, the very idea of ​​adding something to all of these files seems crazy. I call this a “closed” configuration system, ifdef options are as many as the system developer has provided for initially, you can’t add something else without changing the source code.
The second method relies on an assembly system, some functions contain several possible implementations, the desired implementation is determined by the configuration of the assembly system. For example, if a project log is output via the print_log function, then there may be several implementations of this function. The main disadvantage of this method is the runtime overhead. If we need to disable the calls of some functions, we can write a stub module, but some function should still be called.
Finally, the third method is virtual header files. With the help of the file system tools, the abstract names of headers like cpu.h are mapped to specific ones like arm.h or x86.h. Such virtuality of hider helps to disable unnecessary functionality by replacing the hider and replacing functions with “empty” macros, has “openness”: you can substitute any kind of leader without changing the source code. But, unfortunately, there is no connection between the leaders and sishnyh functions containing the implementation of the necessary functions. This consistency between the settings, the abstract names of the headers, the specific names and the build system must again be maintained manually. In addition, it is not always obvious what names and what can be customized. As a result, a poorly documented project again becomes unsupported.
Of course, the imagination of developers is not limited to these three methods, and in various projects one can come across a variety of configuration methods. Somewhere there are shell scripts that generate glue-sources (mainly to organize the distribution of defines to source sources and build files), somewhere CMake / SCons, etc. The abundance of these methods again suggests that there is no agreement and some general methods for solving the configuration problem at the moment.

silver bullet


The central idea of ​​how to solve these problems is that the dependency information should be contained in the source code itself. The link between the header files and the implementation should be explicit, and not only in the developer’s head. Something similar was several decades ago in Pascal (in the hall you can hear the accusations of the speaker in heresy). Old-timers remember that the Pascal modules contained two sections of the interface and the implementation, and uses <name of the module> was used as inclusions. In this variant, the source code itself contained all the information about dependencies, and it was possible to build a complete dependency graph from source codes alone and understand what needs to be compiled in order to build a module.
I would like to add some information to the source code that would help the external source code analysis system to track the dependencies between them.
When searching for solutions to this problem, it was clear to everyone that the time when it was possible to make syntactic changes of this magnitude in the language was long gone. Even the giants of the software industry need a lot of resources in order to bring to life a new language, not to mention making some changes to the holy of holies - C / C ++, on which more than a lot of code has already been written. Therefore it was necessary to be content only with what the standard offers. You can write some tags in a file in many ways, the main question was how to make a “controlled import” (Include), and also to track dependencies. For inclusions, the choice is small, according to the standard, there you can write only file names (in quotes or angle brackets) and macros. It was the latter that gave the key to solving the problem. But first things first.
Without further ado, it was decided to add similar Pascal-like tags to headers and sources. Something like the INTERFACE: IMPLEMENTATION tag. There were a lot of discussions about exactly what they need to do, but ended up with #pragma. Until now, there are doubts whether it was worth doing labels in this form (and getting warning packets like unknown pragma). Theoretically, #pragma allows you to add information about imports / exports also in binaries, then the source code and libraries can participate in the configuration process. Developed (and developed) versions using macros (substitution of values ​​into which occurs at the stage of source code analysis, and the working file contains an empty macro stub). This question remains debatable.
As mentioned above, each leader must have a label inside of itself
#pragma fx interface <interface_name>:<impl_version> 

where <interface_name> is the name of the interface, and <impl_version> is the identifier of the specific implementation (version, variant, etc.). That is, each component must have a unique label. For those components whose interface is the same, the interface_name part should also be the same. A version or implementation is needed to distinguish between implementations. A similar label must have an implementation file, only instead of fx interface - fx implementation. Different interface / implementation pragmas are needed in order to distinguish between tags located in the headers and in the source code (the reason why this cannot be understood from the file extension will be discussed below).
Thus, a component consisting, for example, of a header and sish files must contain the same interface_name and version in all its files, only in the header will #pragma fx interface, and in the source files pragma fx implementation. Another implementation of the same component (with the same interface) should also contain the same label in all its files, but the version should be different from the one written in the first component, etc. Including something with #include should use the interface name, not the file name, as usual. The current implementation uses a macro for this.
 FX_INTERFACE(<interface>,<impl_version>) 

Where do these macros come from? Some external tool, having received the paths to the source code, should analyze them, find these labels in them, build all the columns, then generate a common header file containing mappings of file names to interface names and define the FX_INTERFACE macro. This file must be forcibly included in all compiled files with the compiler directive INSTEAD of paths to the Include directories, since this file already contains the paths to all the necessary files (however, no one forbids mixing it in any proportions with the standard approach with the usual #include with parentheses).
Consider a module A, located in the file module_a.c and module B, in the file module_b.c, as well as their corresponding headers. module_b.h includes the interface defined in module_a.h. In the first approximation, it looks like this:

 /*   module_a.h.     VER1 */ … #pragma fx interface A:VER1 


The implementation is defined in the module_a.c file:

 /*   A  VER1 */ #pragma fx implementation A:VER1 


Create a module B, which uses (refers) to module A:

 /* Module_B.h.   A */ #include FX_INTERFACE(A, VER1) … /*   B  VER1 */ #pragma fx interface B:VER1 


The implementation of module B in the “module_b.c” file:

 /*    */ #include FX_INTERFACE(B, VER1) … /*   B  VER1 */ #pragma fx implementation B:VER1 


From this it is already clear how you can get the dependencies of the sources on the header file. If someone uses a header that contains a specific interface, then you must also compile all implementation files (with the extension c or cfp) that contain the same implementation labels.
How to track dependencies now and understand that if we need to compile module_b.s, then this entails the need for compilation and module_a.s? If each file includes all the modules on which it depends, in the form of include, and each header file contains #pragma fx interface, then, having processed the file with a preprocessor, and having found all #pragma in the output, you can understand which module dependencies (all found implementation, depend on all found interfaces). In particular, by passing the module_b.c file through the preprocessor, we get something like:

 /*       */ #pragma fx interface A:VER1 … /*       */ #pragma fx interface B:VER1 … /*   module_b.c */ #pragma fx implementation B:VER1 


From this example, it is clear why we need different pragmas for headers and source codes: after the preprocessor works, information about extensions is lost, therefore, in order to track the dependencies of source codes on headers, labels must be different.
Now consider the main feature, which in fact was the root cause of all our research. If you look at the classic example of AOP or OT logging, then it is usually defined as a set of macros, which, if disabled in one module, are disabled throughout the project. The same trick is done with tracing and some other things. It is logical to ask the question, why not do the same thing at all with any module (interface)? Since the inclusion of a header is usually the inclusion of some abstract interface, why not make one as such. By writing this:
 #include FX_INTERFACE(A, DEFAULT) 

we receive the module importing some abstract interface, and at the source level there is no information about which implementation will be used. The project becomes truly modular, the source code no longer contains any implicit information about the connections between the components.
It is clear that in case of using abstract interfaces external information about links is needed, which will contain information like A: DEFAULT = A: VER1. It may be contained in some external mapping file. Now, by changing this mapping file, we can make a full-fledged OT that works during compilation, and with automatic dependency tracking and automatically finding out which files should go into the assembly.
How does the mapping file differ from the global config.h mentioned above? There are few differences, but they are important, config.h assumes that it will be included in all files and that all files must be compiled (perhaps the contents of some of them will be completely excluded by the preprocessor), and the proposed approach affects the construction of the module graph and the construction of file lists for the assembly, that is, what should not be compiled - will not be compiled at all, and the dependencies are extracted from the sources themselves and, for example, commenting out some include automatically get the exception of the implementation files from the assembly without having to go Most do it manually.
Components obtained in this way are much more loyal to reusing themselves than regular sources, in particular, such modules are simply copied into the project tree so that they can be used. As a bonus, there is absolute freedom in the structure of the sources, you can shift them into folders and rename files at least 20 times a day. You can forget about setting up include-directories, problems with renaming files, writing new files in makefiles, etc. In addition, if the interfaces are done at the proper level of abstraction, you can also refuse # ifdefs. The system turns out to be “open”; you can specify any implementation as a virtual header, and not only those that were originally provided for, as would be the case with #ifdef. About cross-platform and compatibility with all compilers I will not speak: here and so everything is clear.
A dependency injection system has a very simple architecture before this deployment is done. The whole system is a flat set of components that are not connected with each other. Architecture, as a hierarchy between them, and as an interaction graph, is entirely determined by dependency information. Components, of course, cannot be connected arbitrarily, they impose restrictions on possible configurations, but all this information can be extracted from the sources. Configuring can be done at the subgraph substitution level.



This is what concerns the multicolored circles.
Of course, besides the advantages, there are many problems, I will list only some of them: the interfaces are global, that is, there can be only 1 copy of each module in the assembled system. In addition, the correspondence of the interface name and the contents of the header is on the conscience of the author. The fact that the interface is syntactically suitable for this configuration does not mean that the implementation will work as expected (and that will work in all possible cases). Therefore, if the module is suitable for the interface, it does not mean that it is suitable in all cases. The consequences of using the "wrong" module can be sad, even if everything is compiled.



Another issue is circular dependencies. Actually, this problem exists also in the “usual” C, when several headers turn on each other. The same rules as “in general” work here, since nothing beyond the preprocessor is used, and all #include FX_INTERFACE is ultimately mapped to file names, these files should be protected from being re-enabled using the usual methods: #ifndef / # define or #pragma once. The situation when two modules depend on each other is acceptable, since the files are processed by the preprocessor separately, no recursion will occur, and possible duplicate files are removed from the source list for compilation (this is not a hack, this is a feature: the same module may be included several times , but this does not mean that its source must be written several times in the makefile).
In general, there are still problems, but, as they say, "we are working on it."

Example


It would be possible, according to tradition, to consider making a logging module or something similar, but polymorphism at the function level is not very interesting. In C, quite often run-time type control is used, that is, the methods of the object may want to check that they were given the type of argument that they expect, despite the possible type conversion by the user. For these purposes, a special field is allocated inside the structure, containing some nontrivial value, which is set at the stage of object initialization by the designer, and then checked by all functions that work with the object. If there is a different value from the expected one, then we were given an object of the wrong type, or somewhere in the memory of another was overwritten. It is also clear that after the program is debugged, all this functionality needs to be removed. Dynamic configuration methods do not solve the problem, since after determining the sizeofs of all structures, it is no longer possible to remove something from them and reduce the size.
Various implementations of this mechanism are possible. For example, during initialization, instead of simply setting up some magic value, using the address of this field, which is located in the object, you can add an object to some “pool of valid objects existing in the system”. If an address has arrived, but there is no address in the “pool”, then the object is invalid.
"Classic" methods for solving this problem have disadvantages. In particular, on / off can be implemented using #ifdef, but using different implementations already requires interaction with the build system. If the corresponding defines also affect the build system, it is still not so easy to add some new version, since the configuration system needs to be “taught” to this new version (set file names for compilation in configs + dependencies). Consider how you can make an "open" system for configuration.
Let's start with the interface of the module that implements the functions we need. The interface of the module, that is, what should be contained in its header, includes some type of data that must extend the protected objects, as well as a set of functions for initializing this type of data and for checking.

 /*    .*/ typedef struct _runtime_protection { int magic_cookie; } runtime_protection, *pruntime_protection; /* .*/ void runtime_protection_init(pruntime_protection object, int cookie); int runtime_protection_check(pruntime_protection object, int cookie); /* .*/ #pragma fx interface RUNTIME_PROTECTION:VER1 


Its implementation is not important, it is important that if someone uses such a leader, the implementation will fall into the build automatically. The concept of "interface" is used here in a broad sense, unlike the definition in the OOP style: by "interface" is meant some agreement, and not just a set and function signatures. In particular, the “runtime_protection” data type is part of the RUNTIME_PROTECTION interface, that is, all leaders that have such a pragma are required to somehow define this data type (not necessarily as a structure).
Modules using such protection must, before using the mentioned data types and functions, import the interface:

 /*    .*/ #include FX_INTERFACE(RUNTIME_PROTECTION, DEFAULT) /*«»    .*/ enum { MAGIC_VALUE_SOME_OBJECT = 0x11223344 }; /*       .*/ typedef struct some_object { int dummy; runtime_protection rtp; } some_object, *psome_object; /*    .*/ #define some_object_as_rtp(so) (&((so)->rtp)) /*   some_object.*/ #pragma fx interface SOME_OBJECT:VER1 


The implementation of the object simply uses functions, without thinking about what they are and where they come from:

 /*  .*/ #include FX_INTERFACE(SOME_OBJECT, VER1) /*   some_object.*/ #pragma fx implementation SOME_OBJECT:VER1 void some_object_init(psome_object object) { /* «»   .*/ runtime_protection_init( some_object_as_rtp(object), MAGIC_VALUE_SOME_OBJECT); … } void some_object_method(psome_object object) { /*  .*/ If(!runtime_protection_check( some_object_as_rtp(object), MAGIC_VALUE_SOME_OBJECT)) { //Error! Invalid object. } } 


In case we want to disable type checking, we need to write a stub interface like:

 typedef struct _runtime_protection {} runtime_protection, *pruntime_protection; #define runtime_protection_init(obj, magic) /*  ,   .*/ #define runtime_protection_check (1) /*    RUNTIME_PROTECTION.*/ #pragma fx interface RUNTIME_PROTECTION:STUB 


Although not standard, some compilers for C make the sizeof empty structures equal to zero (by standard, sizeof cannot be equal to zero, but according to the same standard, an empty structure in C is an error). If strict adherence to the standard letter is required, then you can either wrap the declaration of the structure with a macro, or accept some overhead and write an int dummy there.
If a stub is used as the default interface, it becomes syntactically equivalent to the absence of any checks, if error handling is not wrapped, this will lead to conditions like if (true) that should be optimized by the compiler. Now you can write different versions of such a module, and the source code that uses this interface will not require any changes at all. By changing only the mapping file, both the leaders and the lists of files for compilation and assembly will automatically change.

FX DJ


The finale of our research (which, at the moment, can be shown to the public) was the FX-DJ tool, FX — because it was originally intended solely for configuring FX-RTOS — our other project, and all the tools were prefixed with FX (why the OS itself called, this is a topic for another conversation). Well, DJ, as you might have guessed, means Dependency inJector, and not a DJ at all, although there are certain parallels with DJs in terms of the tasks performed.
It represents an additional build phase that runs before compilation — the configuration phase, when a hierarchical system based on dependencies is formed from a certain abstract pool of source codes-components, which determine the lists of files to be collected and then usually, the compile and build phases. The list of files for compilation is supported both in the form of a flat list for use by some external tool, and in the make format.
The whole process can be represented like this:



Alias ​​files are the same files that map default-interfaces to specific ones. Each of them corresponds to a specific configuration. Target is the target interface, the one at the root of the dependency tree.
So far this is all done in the form of a compact script in Python 2.7 (the package also includes an executable file compiled with py2exe that does not require installed Python). The tool is distributed under a modified BSD-license, so you can do almost anything with it and for free. Although everything has already been published, in general everything is still in the alpha-version state, the “company blog” is in the process of buying, so for now let's talk only about concepts, and download links will be later.
If the topic turns out to be interesting, I will tell you in the next article about FX-RTOS: why do we need “one more” operating system, the ideas underlying it, “why it could not be done on the basis of Linux”, etc.

Conclusion


Recently, the topic of modules in C / C ++ rises regularly, starting from discussing them in the C ++ Standardization Committee and ending with Apple's suggestions for their support in LLVM . The post describes another look at this problem and how to solve it. The resulting version has, of course, certain shortcomings, but does not require any syntactic changes in the language, it can be used in any projects right now and solves the tasks for which it was created: creating source-level program modules, independence from the folder structure of a specific project, including / disconnecting / replacing modules without changes in source codes, automatic assembly management without the need to manually describe dependencies, cross-platform, 100% compatibility with the standard etc.
That's all for now. Thank you all for your attention.

Source: https://habr.com/ru/post/171479/


All Articles