📜 ⬆️ ⬇️

What's wrong with GNU make?

GNU make is a well-known utility for automatically building projects. In the UNIX world, it is the de facto standard for this task. Being not so popular among Windows developers, however, led to the emergence of such counterparts as Microsoft's nmake.

However, despite its popularity, make is in many ways a flawed tool. Its reliability is questionable; performance is low, especially for large projects; The makefile language itself looks abstruse and, at the same time, it lacks many of the basic elements that are originally present in many other programming languages.

Of course, make is not the only tool for automating builds. Many other tools have been created to get rid of the limitations of make. Some of them are definitely better than the original make, but this had little effect on make’s popularity. The purpose of this document, in simple terms, is to talk about some of the problems associated with make-so that they do not come as a surprise to you.
')
Most of the arguments in this article refer to the original UNIX make and GNU make. Since GNU make is likely to be much more common today, when we mention make or “makefiles,” we mean GNU make.

The article also assumes that the reader is already familiar with make at a basic level and understands such concepts as “rules”, “goals” and “dependencies”.

Language design


Anyone who has written a makefile at least once has most likely already stumbled upon the “feature” of his syntax: it uses tabs. Each line describing the launch of a command must begin with a tab character. Spaces are not suitable - only tabulation. Unfortunately, this is just one of the weird aspects of make.

Recursive make

“Recursive make” is a common pattern when defining makefile rules when a rule creates another make session. Since each make'a session only reads a top-level makefile once, this is a natural way to describe a makefile for a project consisting of several sub-projects.

“Recursive make” creates so many problems that even an article was written showing how bad this solution is. Many difficulties are indicated in it (some of them are listed below), but writing makefiles that do not use recursion is actually a difficult task.

Parser

Most programming language parsers follow the same behavior. At the beginning, the source text is divided into “tokens” or “scanned”, comments and spaces are thrown out and the input text (specified in a fairly free form) is translated into a stream of “tokens” such as “characters”, “identifiers” and “reserved words” . The resulting stream of tokens is then “parsed” using the grammar of the language, which determines which combinations and order of the tokens are correct. In the end, the resulting "grammar tree" is interpreted, compiled, etc.

The make parser does not follow this standard model. You cannot parse the makefile without simultaneously executing it. Variable substitution can occur anywhere, and since you do not know the value of the variable, you cannot continue parsing. As a result, this is a very nontrivial task - to write a separate utility that can parse makefiles, since you have to write an implementation of the entire language.

There is also no clear division into lexemes in the language. For example, let's see how a comma is processed.

Sometimes a comma is part of a string and does not have a special status:
X = y,z 


Sometimes a comma separates lines that are compared in an if statement :
 ifeq ($(X),$(Y)) 


Sometimes a comma separates the function arguments:
 $(filter %.c,$(SRC_FILES)) 


But sometimes, even among function arguments, a comma is only part of a line:
 $(filter %.c,ac bc c.cpp d,ec) 

(since filter takes only two parameters, the last comma does not add a new parameter; it becomes just one of the characters of the second argument)

Spaces follow the same obscure rules. Sometimes spaces are counted, sometimes not. Strings are not quoted, because of this, it is not visually clear which spaces are significant. Due to the absence of such a data type as “list” (only strings exist), spaces should be used as a separator for list elements. As a consequence, this leads to an excessive complication of logic, for example, if the file name simply contains a space.

The following example illustrates the intricate logic for handling spaces. It is required to use an obscure trick to create a variable that ends with a space. (Usually, spaces at the ends of lines are discarded by the parser, but this happens before and not after changing variables).
 NOTHING := SPACE := $(NOTHING) $(NOTHING) CC_TARGET_PREFIX := -o$(SPACE) #       $(CC_TARGET_PREFIX)$@ 


And we just touched commas and spaces. Only a few people understand all the intricacies of make'a parser.

Uninitialized variables and environment variables.

If an uninitialized variable is accessed in makefile, make does not report an error. Instead, it gets the value of this variable from an environment variable with the same name. If the environment variable with this name is not found, then it is simply assumed that the value will be an empty string.

This leads to two types of problems. The first is that typos are not caught and are not considered errors (you can force make to issue warnings for such situations, but this behavior is disabled by default, and sometimes uninitialized variables are used intentionally). Second, environment variables can unexpectedly affect your makefile code. You cannot know for sure which variables could be set by the user, therefore, for reliability, you must initialize all variables before referencing them or adding them through +=

There is also a confusing difference between the make behavior if called as " make FOO=1 " with the call " export FOO=1 ; make ". In the first case, the line in makefile FOO = 0 has no effect! Instead, you should write override FOO = 0 .

Conditional syntax

One of the main drawbacks of the makefiles language is the limited support for conditional expressions (conditional statements, in particular, are important for writing cross-platform makefiles). Newer versions of make already contain support for " else if " syntax. Of course, the if statement has only four basic options: ifeq, ifneq, ifdef , and ifndef . If your condition is more complicated and requires checking for “and / or / not”, then you have to write more cumbersome code.

Suppose we need to define Linux / x86 as the target platform. The following hack is the usual way to replace the condition “and” with its surrogate:
 ifeq ($(TARGET_OS)-$(TARGET_CPU),linux-x86) foo = bar endif 


The “or” condition is no longer so simple. Suppose we need to define x86 or x86_64, and also instead of " foo = bar " we have a code for 10+ lines and we do not want to duplicate it. We have several options, each one is bad:
 # ,   ifneq (,$(filter x86 x86_64,$(TARGET_CPU)) foo = bar endif # ,    ifeq ($(TARGET_CPU),x86) TARGET_CPU_IS_X86 := 1 else ifeq ($(TARGET_CPU),x86_64) TARGET_CPU_IS_X86 := 1 else TARGET_CPU_IS_X86 := 0 endif ifeq ($(TARGET_CPU_IS_X86),1) foo = bar endif 


Many places in makefiles could be simplified if the language supported a full syntax.

Two kinds of variables

There are two types of variable assignments in make. " : = " evaluates the expression to the right immediately. The regular " = " evaluates the expression later when the variable is used. The first option is used in most other programming languages ​​and, as a rule, more effective, in particular, if the expression is difficult to calculate. The second option, of course, is used in most makefiles.

There are objective reasons for using " = " (with lazy evaluation). But you can often get rid of it using a more accurate makefile architecture. Even without taking into account the performance problem, deferred calculations make the makefiles more difficult to read and understand.

Usually, you can read the program from the beginning to the end - in the same order in which it is executed, and know exactly what condition it is in at any given time. With deferred calculation, you cannot know the value of a variable without knowing what happens next in the program. A variable can change its value indirectly, without directly changing it. If you try to look for errors in makefiles using debugging output, like this:
 $(warning VAR=$(VAR)) 
... you may not get what you need.

Template substitutions and file searches

Some rules use the% sign to indicate the main part of the file name (without the extension) in order to specify the rule for generating some files from others. For example, the rule " % .o:% .c " for compiling .c files into an object file with the extension .o .

Suppose we need to build an object file foo.o but the source file foo.c is somewhere not in the current directory. Make'a has a vpath directive that tells him where to look for such files. Unfortunately, if a file named foo.c meets two times in directories, make can select the wrong file.

The following standard makefile programming pattern fails if two source files have the same name (but different extension) and lie side by side. The problem is that the “source file name => object file name” conversion loses some of the information, but the make design requires it to perform the reverse mapping.

 O_FILES := $(patsubst %.c,%.o,$(notdir $(C_FILES))) vpath %.c $(sort $(dir $(C_FILES))) $(LIB): $(O_FILES) 


And other missing features

make does not know any data types - only strings. No boolean type, lists, dictionaries.
There is no concept of "scope". All variables are global.
Support for loops is limited. $ (foreach) will evaluate the expression several times and combine the results, but you cannot use $ (foreach) to create, for example, a group of rules.
User-defined functions exist, but have the same limitations as foreach . They can only do variable substitution and cannot use the language syntax completely or create new dependencies.

Reliability


Make'k's reliability is low, especially on large projects or incremental compilation. Sometimes the build falls with a strange error, and you have to use "magic spells" such as make clean and hope that everything is fixed. Sometimes (a more dangerous situation) everything looks fine, but something has not been recompiled and your application will crash after launch.

Missing dependencies

You should tell make'u about all the dependencies of each target. If you do not, it will not recompile the target when the dependent file changes. For C / C ++, many compilers can generate dependency information in a format understood by make. For other utilities, however, the situation is significantly worse. Suppose we have a Python script that includes other modules. Changes in the script lead to a change in its results; This is obvious and easy to add to the makefile. But a change in one of the modules can also change the output of the script. A complete description of all these dependencies and keeping them up to date is a non-trivial task.

Using the label “last modified file”

make determines that the target requires a rebuild by comparing its “last modified time” with the same time for its dependencies. There is no analysis of the contents of the file, only a comparison of their times. But using this file system information is not always reliable, especially in a networked environment. The system clock may lag behind, sometimes other programs can forcibly set the modification time they need from files, erasing the “real” value. When this happens, make does not rebuild the targets that need to be rebuilt. The result is only partial recompilation.

Dependence on command line parameters

When a parameter string of a program changes, its results may also change (for example, a change in -Doption, which is passed to the C preprocessor). make will not recompile in this case, which will lead to incorrect intermediate recompilation.

You can try to protect yourself from this by adding a Makefile to the dependency for each target. However, this approach is unreliable, since you can miss a goal. Moreover, a Makefile may include other Makefiles, which may also include other Makefiles. You will need to list them all and keep this list up to date. In addition, many changes to makefiles are minor. You most likely do not want to recompile the entire project just because you changed the comment in the makefile.

Inheritance of environment variables and their dependency

Not only each environment variable becomes a make variable, but also these variables are passed to every program that make runs. Since each user has his own set of environment variables, two users running the same assembly can get different results.
Changing any environment variable passed to a child process can change its output. That is, this situation should initiate a rebuild, but make will not do that.

Multiple simultaneous sessions

If you run two instances of make in the same directory at the same time, they will collide with each other when they try to compile the same files. Most likely, one of them (or even both) will crash.

Editing files during rebuild.

If you edited and saved the file while make'a was running, the result can not be predicted. Maybe make correctly pick up these changes, but maybe not - and you will need to run make again. Or, if you're unlucky, saving may happen at such a moment that some of the targets will require rebuilding, but subsequent runs of make will not detect this.

Delete unnecessary files

Suppose your project initially used the file foo.c, but later this file was removed from the project and from the makefile. The temporary object file foo.o will remain. This is usually permissible, but such files can accumulate over time and sometimes lead to problems. For example, they may be mistakenly selected during a vpath search. Another example: let's say one of the files previously generated by make at the time of build is now put into the version control system. The rule that generated this file is also removed from the makefile. However, version control systems usually do not overwrite files if they see that a non-versioned file with the same name already exists (for fear of deleting something important). If you did not pay attention to the message about such an error, did not delete this file manually and did not re-update the directory with the sources, then you will use an outdated version of this file.

Normalization of file names

The same file can be accessed using different paths. Even without taking into account the hard and symbolic links, foo.c, ./foo.c, ../bar/foo.c, /home/user/bar/foo.c can point to the same file. make should handle them appropriately, but it does not.
The problem is even worse under Windows, where the file system is not case-sensitive.

Consequences of interrupted or failed reassembly

If the build is dropped in the middle of the process, further incremental recompilations may be unreliable. In particular, if the command returned an error, make does not delete the intermediate output file! If you run make again, it may consider that the file no longer requires recompilation and try to use it. The make'a has a special option that forces it to delete such files, but it is not enabled by default.
Pressing Ctrl-C during rebuilding can also cause your source tree to be in an incomprehensible state.
Every time you encounter problems during an incremental rebuild, there is a doubt - if one file has not been rebuilt correctly, who knows how many more such files exist? In this situation, you may need to start over with make clean. The problem is that make clean does not give any guarantee (see above), you may have to deploy the source tree again in another directory.

Performance


Make'a's performance scales poorly (non-linearly) with increasing project size.

Incremental build performance

You can hope that rebuilding a project takes time proportional to the number of goals you want to rebuild. Unfortunately, this is not the case.
Due to the fact that the result of incremental builds does not always inspire confidence, users should do a full rebuild more or less regularly, sometimes by necessity (if something is not collected, try make clean; make ), and sometimes all the time (because of paranoia). ). It is better to be confident and wait for a complete rebuild, than to risk that some part of it was out of sync with the sources.
The “last modified time” of a file can change without changing the contents of the file. This leads to unnecessary recryption.
A poorly written makefile may contain too many dependencies, because of this, the targets can be recompiled even if its (real) dependencies have not changed. The careless use of phony targets is another source of error (such targets must always be rebuilt).
Even if your makefiles don't fail, and your incremental builds are absolutely reliable, performance is not perfect. Suppose you edited one of the .c files (not the header file) in a large project. If you type make at the root of the project, make will have to parse all the makefiles, recursively calling itself many times, and go through all the dependencies, figuring out whether they need to be rearranged. The launch time of the compiler itself can be significantly less than the total time.

Recursive make and performance

Careless use of a recursive make can be dangerous, for example, in such a scenario. Suppose your project contains the sources of two executable files A and B, which in turn depend on the C library. The top-level makefile must recursively enter directories A and B, of course. We would also like to be able to call make in directories A and B if we want to build only one of the executable files. Accordingly, we must recursively call make from the ../C directory. And if you call make from the root of the project, we will get to C twice!
In this example, it doesn’t look scary, but in large projects it may look in some directories dozens of times. And each time the makefile should be read, parsed and all its dependencies should be checked. In make'e there are no built-in tools to prevent such situations.

Parallel Make

The “parallel launch” of make promises a big increase in speed, especially on modern processors with multiple cores. Unfortunately, reality is far from promise.
The text output of the "parallel make" is hard to read. It’s hard to see which warning / line / etc. refers to which team, when several processes are simultaneously running in the same environment.
Parallel make is especially sensitive to correctly specifying dependencies. If the two rules are not related via dependencies, make assumes that they can be called in any order. When a single make is called, its behavior is predictable: if A depends on B and C, then B will be built first, then C, then A. Of course, make has the right to build C to B, but (in sequential make mode), the order is defined .
In parallel mode, B and C can (but are not required to) be built in parallel. If C (in fact) depends on B, but this dependence is not spelled out in the makefile, then building C will most likely fail (but not necessarily, depends on specific times).
Parallel make sticks out the problem of missing dependencies in makefiles. This in itself is a good thing, because they lead to other problems, and it's great that you can catch them and fix them. But in practice, on large projects, the result of using a parallel make'a is disappointing.
The interaction of a parallel make with a recursive make is difficult. Each make'a session is independent, that is, each tries to parallelize its work independently of the others and does not have a general idea of ​​the complete dependency tree. We must find a compromise between reliability and performance. On the one hand, we want to parallelize the assembly not only of a single makefile, but of all other makefiles. But, since make does not know about inter-makefile dependencies, full parallelization of sub-make does not work.
Some sub-make'i can be run in parallel, others must be run in sequential mode. Specifying these dependencies is inconvenient, and it is very easy to miss a few of them. There is a temptation to return to a reliable sequential method of parsing a tree of makefiles and to parallelize only single makefiles at any given time, but this greatly reduces the final performance, in particular with incremental build.

Automatic dependency generation for Microsoft Visual C ++

Many compilers, like GCC, can produce dependency information in a format understood by make. Unfortunately, Microsoft Visual C ++ does not do this. It has a special key / showIncludes , but an additional script is required to translate this information into the make format. This requires running a separate script for each C-file. Running, for example, the Python interpreter for each file is not an instant operation.

Embedded rules

make contains a huge number of built-in rules. They make it possible to slightly simplify the code of small makefiles, but medium and large projects usually override them. They affect performance, as make'u has to wade through all these additional templates trying to find rules for compiling files. Many of them are outdated - for example, use with RSC and SCCS revision control systems. Only a few people use them, but these rules will slow down all builds of all other users.
You can disable them from the command line with make -r , but this is not the default behavior. You can disable them by adding a special directive to the makefile, but this is also not the default - and many people forget to do it.

Other


There are also other notes to make that do not fall into the previous categories.

Silence is gold

According to Eric Raymond, “one of the oldest and most unchanging design rules for the UNIX world is that if a program has nothing to say interesting or unexpected, it should be silent. Well-behaved programs do their work unobtrusively, with a minimum of required attention and concern. Silence is gold". make does not follow this rule.
When you run make, its log contains all the commands it runs and everything that these commands issue to stdout and stderr. It's too much.Important warnings / errors drown in this stream, and the text is often displayed so quickly that it becomes unreadable.
You can greatly reduce this output by running make -s , but this is not the default behavior. Also, there is no intermediate option, in which make shows what it is doing now - without typing command lines.

Multipurpose rules

Some utilities generate more than one file as a result of their work. But make'a rules can have only one goal. If you try to write a separate dependency on such an extra file, make cannot detect the connection between these two rules.

Warnings that must be errors

Make prints warnings, but does not stop working if it detects circular dependencies. This most likely indicates a serious error in the makefile, but make evaluates this situation as a minor nuisance.
Similarly, make prints a warning (and continues to work further) if it detects that there are two rules describing how to make one target. He just ignores one of them. And again - this is a serious bug in makefile, but make doesn't think so.

Creating directories

It is very convenient to put the output files for different configurations in different directories, and you will not need to rebuild the entire project when you change the configuration. For example, you can put “debug” binaries in the “debug” directory and likewise for the “release” configuration. But before you start putting files in these directories, you will need to create them.
It would be great if make did this automatically - obviously, it is impossible to build a target if its directory does not yet exist - but make does not.
It is not very practical to call mkdir -p $ (dir $ @)) in each rule. This is inefficient, and besides, you should ignore the error if the directory already exists.
You can try to solve the problem this way:
 debug/%.o: %.c debug $(CC) -c $< -o $@ debug: mkdir $@ 


— «debug» , debug/foo.o. . « » . , — debug/foo.o debug/bar.o. debug/bar.o «debug». , debug/foo.o, , , make, debug/foo.o . ( ), .
The solution is to create a dependency on the file (for example, debug / dummy.txt), and not on the directory. This requires additional actions in the makefile ( touch debug / dummy.txt ), and may interfere with the ability of make to automatically delete intermediate files. And if you are not careful about specifying this additional dependency (on dummy.txt) for each target, you will get problems when you run make in parallel mode.

findings


Make — . , . , make'. make, .

PS: — . , , , . «make-» ( ), make' , .

Source: https://habr.com/ru/post/138682/


All Articles