Internal and external linking in C ++

Good day everyone!

We present to you a translation of an interesting article that was prepared for you as part of the C ++ Developer Course. We hope that it will be useful and interesting for you, as well as our listeners.

Go.
')
Have you ever come across the terms internal and external communication? Want to know what the extern keyword is used for, or how does declaring something static affect the global scope? Then this article is for you.

In a nutshell

The translation unit includes the implementation file (.c / .cpp) and all its header files (.h / .hpp). If an object or function has an internal binding within a translation unit, then this symbol is visible to the linker only within that translation unit. If an object or function has an external binding, then the linker will be able to see it when processing other translation units. Using the static keyword in the global namespace gives the symbol internal binding. The keyword extern gives external linking.
The default compiler gives the characters the following bindings:

Non-const global variables - external binding;
Const global variables - internal binding;
Functions - external binding.

The basics

We first talk about two simple concepts needed to discuss binding.

The difference between a declaration and a definition;
Units broadcast.

Also note the names: we will use the concept of “symbol” when it comes to any “code entity” that the linker works with, for example, with a variable or function (or with classes / structures, but we will not focus on them).

Ad VS. Definition

We briefly discuss the difference between the declaration and the definition of a symbol: the declaration (or declaration) tells the compiler about the existence of a particular symbol, and allows access to this symbol in cases that do not require an exact memory address or symbol storage. The definition tells the compiler what the function body contains or how much memory to allocate to the variable.

In some situations, the compiler does not have enough declarations, for example, when a data element of a class has a reference type or value (that is, not a reference, and not a pointer). At the same time, a pointer to a declared (but undefined) type is allowed, since it needs a fixed amount of memory (for example, 8 bytes on 64-bit systems), independent of the type pointed to. To get a value on this pointer, a definition is needed. Also, to declare a function, you need to declare (but not define) all parameters (no matter whether they are taken by value, reference, or pointer) and the return type. Determining the type of return value and parameters is only necessary to define a function.

Functions

The difference between a definition and a function declaration is quite obvious.

int f(); //  int f() { return 42; } //

Variables

With variables, it's a little different. The declaration and definition are usually not separated. The main thing is that:

 int x;

Not only declares x , but also defines it. This happens due to the default constructor call int. (In C ++, unlike Java, the constructor of simple types (such as int) does not by default initialize the value to 0. In the example above, x will be equal to any garbage lying in the memory address allocated by the compiler).

But you can explicitly separate the declaration of a variable and its definition using the extern keyword.

 extern int x; //  int x = 42; //

However, when initializing and adding extern to the declaration, the expression becomes a definition and the keyword extern becomes useless.

 extern int x = 5; //   ,   int x = 5;

Preliminary Announcement

In C ++, there is the concept of pre-declaring a character. This means that we declare the type and name of the symbol for use in situations that do not require its definition. So we will not need to include a full definition of a character (usually a header file) unless explicitly necessary. Thus, we reduce the dependence on the file containing the definition. The main advantage is that when a file is modified with a definition, the file where we previously declare this symbol will not require recompilation (and, therefore, all other files including it).

Example

Suppose we have a function declaration (called a prototype) for f, which accepts an object of type Class by value:

 // file.hpp void f(Class object);

Immediately include the definition of Class - naive. But since we have just declared f , it suffices to provide the compiler with the declaration Class . Thus, the compiler will be able to recognize the function from its prototype, and we will be able to get rid of the file.hpp dependency on the file containing the definition of Class , say class.hpp:

 // file.hpp class Class; void f(Class object);

Suppose file.hpp is contained in 100 other files. And let's say we change the definition of Class in class.hpp. If you add class.hpp to file.hpp, file.hpp and all 100 files containing it will need to be recompiled. Due to the preliminary declaration of Class, the only files that need to be recompiled are class.hpp and file.hpp (assuming that f is defined there).

Frequency of use

An important difference between a declaration and a definition is that a symbol can be declared many times, but is defined only once. So you can pre-declare a function or class as many times as you like, but there can be only one definition. This is called the Rule of One Definition . In C ++, the following works:

 int f(); int f(); int f(); int f(); int f(); int f(); int f() { return 5; }

And it does not work:

 int f() { return 6; } int f() { return 9; }

Broadcast units

Programmers typically work with header files and implementation files. But not compilers - they work with translation units (translation units, TU for short), which are sometimes called compilation units. The definition of such a unit is quite simple - any file passed to the compiler after it has been pre-processed. To be precise, this is a file resulting from the work of a preprocessor extending a macro, including source code, which depends on #ifdef and #ifndef expressions, and copy-paste of all #include files.

There are the following files:

header.hpp:

 #ifndef HEADER_HPP #define HEADER_HPP #define VALUE 5 #ifndef VALUE struct Foo { private: int ryan; }; #endif int strlen(const char* string); #endif /* HEADER_HPP */

program.cpp:

 #include "header.hpp" int strlen(const char* string) { int length = 0; while(string[length]) ++length; return length + VALUE; }

The preprocessor will issue the following translation unit, which is then passed to the compiler:

 int strlen(const char* string); int strlen(const char* string) { int length = 0; while(string[length]) ++length; return length + 5; }

Connections

After discussing the basics, you can proceed to relationships. In general, communication is the visibility of symbols for the linker when processing files. Communication can be either external or internal.

External communication

When a symbol (variable or function) has an external link, it becomes visible to linkers from other files, that is, “globally” visible, accessible to all translation units. This means that you must define such a symbol in a specific location of one translation unit, usually in the implementation file (.c / .cpp), so that it has only one visible definition. If you try simultaneously with the declaration of the symbol to perform its definition, or place the definition in the file to the declaration, then you risk annoying the linker. Attempting to add a file to more than one implementation file leads to the addition of definition to more than one translation unit — your linker will cry.

The extern keyword in C and C ++ (explicitly) declares that the symbol has an external link.

 extern int x; extern void f(const std::string& argument);

Both symbols have an external connection. As noted above, const global variables have intrinsic binding by default, non-const global variables are extrinsic. This means that int x; - the same as extern int x ;, right? Not really. int x; is actually analogous to extern int x {}; (using the syntax universal / bracket initialization, to avoid the most unpleasant syntax analysis (the most vexing parse)), since int x; not only declares, but also defines x. Therefore, do not add extern to int x; Globally as bad as defining a variable when declaring its extern:

 int x; //   ,   extern int x{}; //      . extern int x; //      ,

Bad example

Let's declare a function f with an external link in file.hpp and define it there:

 // file.hpp #ifndef FILE_HPP #define FILE_HPP extern int f(int x); /* ... */ int f(int) { return x + 1; } /* ... */ #endif /* FILE_HPP */

Please note that you do not need to add extern here, since all functions are explicitly extern. Separation of ads and definitions is also not required. So let's just rewrite it like this:

 // file.hpp #ifndef FILE_HPP #define FILE_HPP int f(int) { return x + 1; } #endif /* FILE_HPP */

Such a code could be written before reading this article, or after reading it under the influence of alcohol or heavy substances (for example, cinnamon buns).

Let's see why this is not worth doing. Now we have two implementation files: a.cpp and b.cpp, both included in file.hpp:

 // a.cpp #include "file.hpp" /* ... */

 // b.cpp #include "file.hpp" /* ... */

Now let the compiler work and generate two translation units for the two implementation files above (remember that #include literally means copy / paste):

 // TU A, from a.cpp int f(int) { return x + 1; } /* ... */

 // TU B, from b.cpp int f(int) { return x + 1; } /* ... */

The linker intervenes at this stage (binding occurs after compilation). The linker takes the character f and looks for a definition. Today he was lucky, he finds as many as two! One in the broadcast unit A, the other in B. The linker freezes with happiness and tells you something like this:

 duplicate symbol __Z1fv in: /path/to/ao /path/to/bo

The linker finds two definitions for one character f . Since f has external binding, it is visible to the linker when processing both A and B. Obviously, this violates the Rule of One Definition and causes an error. More precisely, this causes a duplicate symbol error, which you will receive no less than the undefined symbol error that occurs when you declare a symbol, but forgot to define it.

Using

A standard example of declaring extern variables is global variables. Suppose you are working on a self-baking cake. Surely there are global variables associated with the cake that should be available in different parts of your program. Let's say the clock frequency of the edible scheme is inside your cake. This value is naturally required in different parts for the synchronous operation of all chocolate electronics. The (evil) C-way to declare such a global variable has the form of a macro:

 #define CLK 1000000

A C ++ programmer who is disgusted with macros will better write real code. For example:

 // global.hpp namespace Global { extern unsigned int clock_rate; } // global.cpp namespace Global { unsigned int clock_rate = 1000000; }

(A modern C ++ programmer will want to use delimiting literals: unsigned int clock_rate = 1'000'000;)

Intercom

If the symbol has an internal link, then it will be visible only within the current translation unit. Do not confuse visibility with access rights, such as private. Visibility means that the linker can use this symbol only when processing a translation unit in which the symbol was declared, and not later (as in the case of symbols with an external link). In practice, this means that when declaring a symbol with an internal link in a header file, each translation unit that includes this file will receive a unique copy of this symbol. As if you predetermined each such character in each translation unit. For objects, this means that the compiler will literally allocate a completely new, unique copy for each translation unit, which obviously can lead to high memory consumption.

To declare a symbol with an internal link, in C and C ++ there is a static keyword. This use differs from the use of static in classes and functions (or, in general, in any blocks).

Example

Let's give an example:

header.hpp:

 static int variable = 42;

file1.hpp:

 void function1();

file2.hpp:

 void function2();

file1.cpp:

 #include "header.hpp" void function1() { variable = 10; }

file2.cpp:

 #include "header.hpp" void function2() { variable = 123; }

main.cpp:

 #include "header.hpp" #include "file1.hpp" #include "file2.hpp" #include <iostream> auto main() -> int { function1(); function2(); std::cout << variable << std::endl; }

Each translation unit that includes header.hpp receives a unique copy of a variable, due to its internal connection. There are three translation units:

file1.cpp
file2.cpp
main.cpp

When calling function1, the copy of the file1.cpp variable is set to 10. When calling function2, the copy of the variable file2.cpp is set to 123. However, the value that is displayed in main.cpp does not change and remains equal to 42.

Anonymous Namespaces

In C ++, there is another way to declare one or more symbols with an internal link: anonymous namespaces. This space ensures that the characters declared inside it are visible only in the current translation unit. In essence, this is just a way to declare a few static characters. For a while, the use of the static keyword to declare a symbol with an internal link was abandoned in favor of anonymous namespaces. However, they were again used because of the convenience of declaring a single variable or function with an internal link. There are a few minor differences that I will not dwell on.

In any case, it is:

 namespace { int variable = 0; }

Does (almost) the same thing as:

 static int variable = 0;

Using

So in what cases to use internal communications? Using them for objects is a bad idea. Memory consumption of large objects can be very high due to copying for each translation unit. But basically, it just causes weird, unpredictable behavior. Imagine that you have a singleton (a class in which you create an instance of only one instance) and suddenly several instances of your singleton appear (one for each translation unit).

However, internal communication can be used to hide local translation helper functions from the global scope. Suppose there is a helper function foo in file1.hpp that you use in file1.cpp. At the same time, you have a function foo in file2.hpp, used in file2.cpp. The first and second foo are different, but you cannot think of other names. Therefore you can declare them static. If you do not add both file1.hpp and file2.hpp to the same translation unit, this will hide foo from each other. If this is not done, they will implicitly have an external connection and the definition of the first foo will interfere with the definition of the second, causing a linker error to violate the rule of one definition.

THE END

You can always leave your comments and / or questions here or visit us for an open day.

Source: https://habr.com/ru/post/432834/

All Articles

Internal and external linking in C ++

More articles: