Development of modular C / C ++ applications using annotations

In my first article I talked about the use of the preprocessor for organizing modularity at the source level in C / C ++ languages. In short, this method is reduced to writing specific metadata inside the source code, which is analyzed by an external tool and used to generate glue sources that allow for modularity. The implementation details are described in the mentioned article, so I will not repeat here. In this article I will go a little further and try to show that using metadata or annotations you can realize not only modularity, but also some other useful features. You should get something like Google Guice or Spring for C (the part that deals with modularity and aspects). Separately, I emphasize that this article is the addition and improvement of the first one, so here I will speak not so much about the technical details of the implementation as about how it all looks to the user. If this topic is of interest, then I will write a sequel with explanations about how the configurator application itself is organized.

Universal annotations

The previous approach was based on the use of #pragma directives as a way of recording metadata inside sources. The format of this metadata was chosen arbitrarily and honed for a specific application (description of the connection between the source and the abstract module with which it is associated). In fact, the concept of annotations is somewhat more broad. In particular, it is possible to draw parallels with languages like Java or C #, where annotations can describe arbitrary statements about the code in which they are contained, so it would be convenient to use some universal format.

As examples of existing implementations of such an approach, you can bring things like the Keil configuration wizard , which allow you to describe various variables within a file as XML (for example, some constants, size of arrays, etc.). Then, right in the editor, you can go to the configuration tab and configure these options graphically, rather than looking for them inside the file. In addition, the source often contains other information that could be called annotations (for example, what is written inside __ attribute to GCC, all kinds of __ declpec , etc.).

For a number of reasons, it was decided to refuse to write annotations in the comments, since the latter, in my opinion, should always perform only one function. As you know, "it is so much complexity in software. On the other hand, using #pragma is also fraught with problems: unknown #pragma cause a warning that with a large number of files in a project it does not look very nice: (several warnings in each file). To avoid this, it was decided to wrap the annotations in a macro, which when compiled to display in an empty. If some compiler suddenly starts supporting such a format for its internal needs and requires #pragma , then the C99 standard finally introduces the ability to define #pragma via a macro, so the macro in this sense is a more universal solution.
')
The second question was related to the format of the annotations themselves, which are recorded inside the macro. Although, in recent times, XML is becoming increasingly common for these tasks. It is used, in addition to the above example with Keil, also in the FreeRTOS project (also in the comments). I must say that XML is still not very good for aesthetic reasons - it is quite verbose, therefore, it is not very convenient for a person to work with information in this format. At the same time, there is a machine-readable markup language that is adapted for people to read and write, and is also compatible with C syntax - this is JSON. However, the need to quote the keys led to a not very beautiful appearance of annotations in the code, so an extended version was used - YAML. It should be particularly noted that, of all the possibilities of YAML, ONLY the opportunity to write keys without quotes is used. Annotations, before being transferred to the parser, are always converted to a single line, so the YAML capabilities associated with the line breaks cannot be used. Since JSON is a subset of YAML, you can write everything in pure JSON, the ability to omit the quotes from the keys should be considered as a small nice bonus.

Thus, in the bottom line, we have the following: the source code for C can contain the macro FX_METADATA (x) (with one argument), inside which annotations are written in JSON format. Annotations are enclosed in brackets in order to avoid errors in the case of using a closing bracket inside the annotations themselves. By convention, annotations are always a set of key-value pairs, that is, in terms of JSON, they are a hash. Example:

FX_METADATA(( { annotation: "hello world!" } ))

As you already guessed, the pragmas from the first article were “translated” into a new format, and now what was previously written as:

 #pragma fx interface MY_INTERFACE:MY_IMPLEMENTATION

Now it is written as:

 FX_METADATA(( { interface: [MY_INTERFACE, MY_IMPLEMENTATION] } ))

As regards the implementation of modularity, the changes are purely cosmetic, just everything that used to be recorded as horrible, began to be recorded in a more or less universal format.

As for #include, everything has remained almost unchanged: the FX_INTERFACE macro is still used as an argument, only now it always has one argument - it is always assumed that the interface is turned on by default, without being able to specify any particular interface.

To summarize: The source code contains annotations, which describe the source code belonging to a module. A module is a set of sources containing one header file (describing the interface) and an arbitrary number of source files (* .c, * .cpp, etc.). Information about the belonging of files to modules is retrieved at the stage before compilation. After that, an external application generates a file that maps the names of header files to module names, which allows #include to work in terms of modules, and not just file names, as is usually the case. Since we have a way to find out which modules should be included in the system, and each source contains information on the belonging to modules, we can automatically find out all the files that need to be compiled. That is, having written in some source code #include FX_INTERFACE (MY_MODULE) , we get that all source codes that contain a tag about belonging to the MY_MODULE interface will be included in the system.

Configurator

Naturally, the fact that in managed languages does runtime , here, too, should someone do, so I'll tell you a little about the application that implements it. The annotation framework is implemented in C # as a set of classes (DLL). The question of whether it was worth writing it in C # is left behind the scenes for now. The framework was created in such a way that the use of metadata is not limited to assembly only. Among other things, it provides interfaces for accessing the metadata repository from outside. This opens up possibilities for creating third-party modules that, based on metadata, can do certain actions. In the future, all plug-ins are planned to be connected via MEF and untie them from the project structure.

In addition to the DLL set, there are also two applications that implement the functions of the configurator: console and graphics. The latter is in a highly prototypical state and I will not discuss it now, but the console will be considered briefly.

The configurator has 7 command line keys, and only 3 of them are required, a list of paths to source folders, a target module (the one that starts dependency resolution) and an output file / folder. For example, if our annotated sources are in the c: \ src1 and d: \ src2 folders, and you need to build the MY_MODULE module, the command will look something like this:

 fx_mgr -pc:\src1,d:\src2 -t MY_MODULE -o output_folder

Running the command will result in metadata being extracted from sources, dependencies starting from the MY_MODULE module resolved, all files to be compiled are determined, after which the output of files for compilation will be generated.

There are two ways to "withdraw".

First way

It is suitable for small projects, as well as convenient for the supply of something in the form of source code. When an existing folder was specified as an argument with the -o key, in this case all the necessary files will be copied to the specified folder, and the header files will be renamed according to the name of the interface they implement. Since the system can contain only one implementation of each interface, there will be no name conflict and when placing the files for compilation in a flat form in one folder, the macro FX_INTERFACE ( #include argument) can be defined very simply:

 #define FX_INTERFACE(i) <ih>

No additional files required. In embedded it is quite often necessary to use the libraries supplied in the form of source codes, because of the need to compile them with specific compiler keys. This flat-file approach makes it easy to use make and add files to an IDE, etc.

Second way

Of course, every time something is copied when building is bad, so there is another way: if what was specified with the -o key is not a folder, a new file with the specified name is created (or the existing one is overwritten) and put a list of absolute paths to files that need to be compiled. After that, this file can be used in the build system (including in make or MSBuild) to compile the files. This raises the question, if the files contain annotations, and include use modules, where will the FX_INTERFACE / FX_METADATA macros come from? From nowhere. To get them, use the -h key. If it is specified, a so-called is created. a common header file, the one that contains mappings of file names to module names, as well as definitions of FX_INTERFACE and FX_METADATA. This file must be forced by the compiler directive into all compiled files if this type of assembly is used.
These were all the basics that were implemented before and worked perfectly on #pragma without universal annotations, now let's consider briefly the implementation of dependencies, and move on to issues that already require structured metadata.

Dependency injection

It is worth giving a few words to the need to support multiple implementations of a single interface. Initially, the entire framework grew out of the configuration system for RTOS, which had to perfectly adjust to the requirements of the application. So perfect that a single implementation of each interface might not be enough. The OS itself, as a single entity, did not exist, it was built from smaller blocks. In this case, several implementations of the module were made, each of which was “sharpened” under specific requirements. Then, since the interfaces of all the implementations are the same, it was possible to select the implementations of each interface that are ideal for the application. In other words, this is a static dependency injection in C at the source level. The sources in the project tree do not contain any information about which modules will actually be used; they import abstract interfaces without specifying implementations. The actual implementations are specified by the configurator at the system assembly stage as a file in which this information is described, similar to how it is written in external languages in external XML (if desired, nothing prevents you from writing such XML here). Finally, using the same metadata, it is possible to determine some additional features by which the configurator itself can determine which implementations should be used.

If the source contains only one implementation of each interface, then the configurator itself understands that there is no alternative, and if this interface is imported by someone, it automatically uses the only existing module with the specified interface. If each interface has many implementations, then, in the absence of prompts, the configurator selects the first implementation (and issues a warning about what happened to the console). Since for a given system configuration it is usually known which interface implementations should be used, you can give a hint to the configurator, it is given with the –a <file name> key. The specified file contains lines of the form INTERFACE = VERSION, where INTERFACE is the name of the interface, and VERSION is the name of the implementation you want to use. This is all close to what was described in the first part, the syntax is only slightly simplified, so it’s better to turn to the original source for technical details.

Aspects

Well, here we are getting close to the most interesting, for which everything was started. There are many definitions of aspects in programming, so I will not try to embrace the immense in one short note. Aspect is a part of something that should be assembled depending on the modules included in the project. For example, if there are several types of some objects, sometimes you really want to have an enum containing all these types, and when you add a new type (just include the file in the project), this enum should automatically expand. You may need to know the number of any modules included in this project, and so on.

Obviously, no tricks with the preprocessor and code can achieve this, since information about the modules that are included in this particular project is not available even to the compiler. Perhaps it is better to start with the background here: among the requirements for the OS was the ability to adjust the table of system calls for the set of services available in the OS, while if a module was included in the assembly and had the functionality exported to user-mode, the table of system calls should was to take this into account. All this is not new, such methods are used in some operating systems (for example, in NuttX), however, as a rule, all this is done on the preprocessor. If a module is enabled, then a certain #define is defined; if it is defined, then #ifdef works, which lists the functions of this module inside the table. The table itself, therefore, consists of #ifdefs . This approach, in addition to difficult maintenance, also has such a problem that if a new module appears, then the corresponding functions must be manually added to this table, and the table thus contains knowledge about everything that can appear in the system. Aspects are intended to solve a problem when to add something to a project, you need to make a lot of small edits in different places: expanding enums, adding entries to the configuration of the build system, and so on.

In the proposed configurator, a different approach is implemented. Since we have modules, why not make some modules generated by the configurator itself? After all, he has knowledge of the structure of the system and the metadata of all modules, after the dependencies have been introduced.

Sources contain as a special key of annotations a set of pairs of the form “key-array of values”. After the configurator resolves all dependencies, the modules included in this configuration become known. Further, all these modules are examined for the presence of the aspects key in them, and if such is found, then #define is created with the name of the key, in which all values are listed (from all modules in the system). That is, if one of the modules contains for example the following entry:

 FX_METADATA(({ aspects: [ { key: [ val1, val2, val3 ] } ] }))

And another, like this:

 FX_METADATA(({ aspects: [ { key: [ val4, val5, val6 ] } ] }))

That will be generated by the #define view

 #define key val1 val2 val3 val4 val5 val6

All values (and keys and values) are treated as text, so you can write, for example, like this:

 FX_METADATA(({ aspects: [ { key: [ "val1," , "val2," , "val3," ] } ] }))

And then the values inside #define will be separated by commas. Well, you get the idea.

Where will this #define be generated? In the header file, which is indicated to the configurator as the key -e <file name>. If this key is specified, the configurator adds the generated interface to the metadata store. This interface is called CFG_ASPECTS and is available for inclusion as #include FX_INTERFACE (CFG_ASPECTS), which contains aspects from all modules. It is important to note that if such an interface is already there (that is, the user has already written it manually and it is contained in the source pool after analyzing them), then nothing will be generated, and the existing module will be used.

Options

Options are some external #defines that define some configuration constants for this module. Similar to Keil’s, not in terms of files, but in terms of modules. Their main purpose is to set these parameters through the GUI, so using the options in the console application is rather difficult. As in the case with aspects, options are represented by the generated interface CFG_OPTIONS, which contains all the options for this module configuration, which were described as corresponding annotations inside the source code. The console application generates a file with the options set to default values, while the graphical application allows you to set them.

The model of using the options was assumed as follows: the user receives the OS as a set of sources, launches the configurator, which analyzes these sources and shows the available options, the user sets them up to fit his needs, and the configurator generates the very CFG_OPTIONS file. After that, the user has a full set of necessary configured sources. That is, in the context of the development of them, in general, there is little use, so with this I think it is worth finishing.

Restrictions

In the current version, a simplified model of extracting metadata is used - comments are simply excluded from the file (by regular expressions), and further there is a search for include and metadata (also regular expressions). In 98% of cases, this approach works flawlessly, but in some difficult cases, like redefining macros and included files, it will not work. In this case, you need to use a full-fledged metadata provider that preprocesses every file (with a preprocessor) and extracts metadata and include files with a 100% guarantee, even in the case of any tricks with the preprocessor. But, of course, in the case of a large number of files, this will be rather slow (seconds for a project of several hundred files, which is quite tedious when debugging).

Parallel extraction of metadata is supported in the current implementation: since this operation is perfectly parallelized, the extraction and parsing of metadata takes place using the Parallel.ForEach cycle, which runs on a thread pool, so that if there are multiple processors, this operation is seriously accelerated.

There is also a little difficulty with using IDE, the fact is that IDE IDs assume that all files that are in the current project need to be compiled. In the proposed approach, the configurator, according to the list of folder paths, the project itself determines which files should be compiled, so if the IDE does not allow you to somehow filter the input files (using some kind of plug-in that could say at the time of the build) necessary, and what not to compile), then using such an IDE will be somewhat problematic. I do not use the IDE myself, use Sublime Text or Visual Studio Code as an editor, and the build is configured to run from the command line. In addition, RTOS is supplied as source for a specific configuration, so there are no problems with it, but those who cannot live without an IDE may be disappointed, yes. In fact, an additional build phase is introduced that runs before compilation — the configuration phase, and not all IDEs are compatible with this approach.

Conclusion

I apologize for some confusion of presentation, annotations, aspects and modularity are large separate topics that are very difficult to embrace and fully cover in one post, so I view this post not as something complete, but as the beginning of a discussion. If someone described the things seemed interesting, then you can go to the next step and look at the examples of use and source code of the configurator. All this is available on the githaba . That's all for now, thank you all for your attention.

Source: https://habr.com/ru/post/260061/

All Articles