Extensions to C and C ++. Part 1

This article (and I hope that the series of articles) is devoted to non-standard extensions of C and C ++ languages that exist in almost every compiler.

Language extensions are additional features and features of the language that are not included in the standard, but nevertheless are supported by compilers. It is very interesting to investigate these extensions - first of all because they did not originate from scratch; each extension is the result of the urgent need arising from a large number of programmers. And I find it doubly interesting - since I like programming languages and I develop my own, it often turns out that many of my ideas are implemented precisely in extensions of the language. Standards of C and C ++ languages are developing extremely slowly, and sometimes, reading the description of extensions, I just want to exclaim “well, that's obvious! Why is this still not in the standard? "

Language extensions are such a “gray”, shadow area, about which they usually write little and know little. But it is precisely with this that she is interesting!
')
I can say in advance what will be considered general purpose compilers gcc, msvs, clang, intel, embarcadero, compilers for iar and keil microcontrollers, and, if possible, many other compilers. Most extensions in GCC, which is not surprising - free development contributes to the realization of different language features. In addition, information on GCC extensions is all gathered in one place, and information on the rest of the compilers will have to be collected bit by bit. Therefore, let's start with GCC.

C language extensions

Control statements and code blocks as expressions

the most obvious idea, applied with might and main in modern hybrid (imperative-functional) languages. A code block can be a value in an expression. The value is the value of the last expression of this block of code.

int q = 100 + ({ int y = foo (); int z; if (y > 0) z = y; else z = - y; z; });

Local tags

The labels used for the goto operator, by default, have a scope limited by function. Sometimes - for example, when opening macros - it is unsafe, and it is advisable to limit the scope of the label to the current code block. Such labels require prior announcement using the __label__ keyword . The label itself is declared in the usual way, but now its scope is a block, not a function.

Tags as values

Another interesting and powerful low-level feature associated with the goto operator is the use of labels as values. In fact, this possibility also exists only in Assembler, where the label is only an address in the code. In GCC, however, a special label type was refused, and for converting a label to the type void * for some reason they introduced the unary operator &&. It looks very nice and hacker:

 static void *array[] = { &&foo, &&bar, &&hack }; goto *array[i];

I must say that with the filing of Dijkstra, the goto operator is in disfavor with most programmers. In many cases, this is indeed justified, but one should not forget that C is a hacker language, which means it has an ideology of preference for opportunities over limitations. And if in some specific place, for example, in the kernel of the operating system, you need goto, it is better to use it than to fence assembler inserts. And there are a lot of ways to spoil the code or make it unreadable, among which goto is far from the first place.

Nested functions

Lambda functions in C ++ appeared only in C ++ 11. Meanwhile, in Turbo Pascal there was an opportunity to invest some functions in others. With the advent of C ++ and classes, nothing has changed - classes could be nested in functions and other classes, but it was still impossible to nest functions in functions. GCC corrects this annoying asymmetry in language.

Nested functions support access to ambient variables, but unlike C ++ lambda do not require an explicit indication of "closures", and unlike the lambda of high-level languages, they do not organize such "closures" automatically. Another interesting feature is the goto from the nested function to the ambient one. This is more like a type of throwing an exception.

Redirecting a call with a variable number of arguments to another function

Special language constructs designed to transfer a variable number of function arguments to another function with a variable number of arguments, and information on the number of arguments is not required. As is known, the standard way of working with a variable number of arguments in C are the macros va_start (), va_arg (), va_end () and the type va_list. The method is based on the fact that the arguments of functions in C are written to the stack in the reverse order, and these macros simply provide access to the stack memory. But in this expansion, we clearly see something new. What is this?

void * __builtin_apply_args () - the function allocates memory on the stack and copies the arguments of the calling function there.

void * __builtin_apply (void (* function) (), void * arguments, size_t size) - the function accepts a data block created with __builtin_apply_args, a function pointer and a stack size for it; inside the function call is formed with the passed arguments. Returns a data block on the stack that stores the return value returned from function.

void __builtin_return (void * result) - the function replaces the usual return (that is, after this buildin the code is no longer executed) and returns the result of the execution of the function packed in the result.

Thus, the mechanism is completely different from va_list and can be applied when there is a function with a variable number of arguments that does not have a v-version (that is, a version that accepts va_list — such as vprintf).

For some time now, two more builtins appeared, used only in inline functions, which are always rigidly inline (and not to compiler depreciation, as is the case with ordinary inline functions).

__builtin_va_arg_pack () represents the entire list of unnamed arguments; This builtin is substituted directly in place of the variable-length argument list.
__builtin_va_arg_pack_len () returns the number of unnamed arguments.

As you can guess from the inline mandatory requirements, these builtins work rather at the compilation stage, no stack manipulation, etc. in runtime is not performed.

Typeof operator

The compilation operator returns the type of the expression. A similar decltype operator appeared in C ++ not so long ago. However, I remind you that now we are considering extensions of C, and not C ++! (although they are of course also available in gcc c ++)

Short Conditional Operator

Expression:

 x ? x : y

can be shortened to:

 x ? : y

This is a convenient form of writing, especially if x itself is a long expression. By the way, this form is called Elvis operator and it differs from the Null coalescing_operator (existing for example in C #) in that the Elvis operator leads the first operand to the bool type and compares it to false, and the Null coalescing compares the operand strictly with the special value null.

Types __int128 and long long

Another obvious extension for 128-bit and 64-bit integers. The long long type is standardized both in C and C ++, there is no standard for 128-bit numbers yet. I wonder if it will, then what will it be called? long long long and unsigned long long long?

complex

Support for complex numbers of any type at the language level. I'm not sure that it makes sense to introduce such types into the language, but I remind you that this is C, there are no native objects, constructors, templates, and so on (and in fact this is a template type). The language introduces support for suffixes 'i' and 'j' (they are the same), operators __real__ and __imag__, as well as a set of auxiliary functions.

A sufficiently deep language support allows you to think about what needs to be in the language in order to be able to comfortably implement and use such special types without embedding directly into the compiler.

floating types, half precision

Additional floating point types: __float80, __float128, __fp16.
In fact, if you open the IEEE 754 standard, it turns out that the types are somewhat larger than the well-known float and double (and long double, if anyone remembers).

Decimal float

Another interesting floating point format is base 10, not 2 (see the link above, there are some of these formats too). Let me remind you that the classic float and double in some cases give amusing errors due to the fact that the internal base of the degree is 2, and the textual writing of numbers is decimal (that is, base 10). For example, 0.1 + 0.2! = 0.3

Base point 10 floating point numbers are used in financial calculations where such errors should not accumulate and lead to money leaks.

Hex floats

This is a way to write hexadecimal numbers with floating point (also due to the fact that using decimal notation it is not possible to write certain numbers for sure). Instead of the letter 'e' used for a hexadecimal digit, the letter 'p' is used for exponential notation. How do you like this number: 0x12c0a34.f09de78p3? In my opinion, very much hacker.

Fixed point

Fixed-point numbers are another useful extension in GCC. On some platforms, there may not be an FPU, sometimes fixed-point calculations may be faster or more convenient. At the low level, these are regular integers, for which the price of discharges is accepted, which is different from the generally accepted one. Theoretically, it would be possible to resolve any ratio of the whole and fractional parts, but GCC adopted some specific ratios for the main word sizes (2, 4 and 8 bytes) implemented in the _Fract and _Accum type modifiers . Besides, for some reason this possibility is not included in all compilers, so I didn’t manage to verify this feature in practice.

Another _Sat modifier is used for calculations with saturation - this is a special mode of handling overflows, in which if the result of calculations does not fit into the range of this type, then the maximum or minimum value that is possible for this type is stored in the variable. Accuracy is lost, but no sign transitions occur, which may be preferable in some cases (color, sound, etc.)

Named address spaces

A very useful thing for architectures with multiple address spaces. For example, for different microcontrollers. There is a RAM, flash, eeprom, all of which are several banks. And independent addressing systems for each address space.

Zero Length Arrays

They are used in structures as the last element, if the structure is a variable-length object header. For low-level code is very convenient. In those cases, if the extension is not available (in other compilers), it was necessary to make an array of one element, which is generally not correct - the variable length of the object may be zero. And an extra size can lead to unnecessary memory allocations, etc.

Empty structures

Unlike C ++, where such structures are officially allowed, in C it is an extension. And in C, their size (sizeof) is really zero, unlike C ++, where for some reason it is 1 byte.

Arrays whose size is determined at runtime

The obvious thing. There is a alloca () function that allocates memory on the stack; it does not need to be released. GCC adds the ability to declare arrays at the language level in this way:

 void foo(int n) { int arr[n]; }

Moreover, GCC allows you to declare nested structures with variable-length array fields!

 void foo (int n) { struct S { int x[n]; }; }

And also functions with arrays of variable length (where the length is indicated in the function argument list):

 void foo(int len, char data[len][len]) { /* ... */ }

And if you want to specify the length after the array, then this is possible! GCC introduces a special syntax for preliminary declaration of a variable in the function argument list, which is by the way extremely interesting for many other applications (but this is already a separate topic):

 void foo (int len; char data[len][len], int len) { /* ... */ }

Variable Argument Macros

Such macros appeared in the standard C99 and C ++ 11. In the GCC, they appeared earlier. Also supported some improvements in relation to the standard version. In fact, a macro with a variable number of parameters is a syntax that allows you to transfer a variable number of arguments to a macro and use the package of these arguments as a whole to transfer to other language entities that support a variable number of arguments (functions, other macros and in C ++ also templates). In the macro declaration, the package of arguments is denoted as three dots "...", and in the body, as the identifier __VA_ARGS__ .

Now for the extensions. The first is that instead of three dots and __VA_ARGS__, you can use normal names, which are declared with three dots and are used without them. This improves the readability of the code, and generally a very beautiful idea in and of itself.

 #define LOG(args...) fprintf (stderr, args)

The second is the correct work with the “final commas”. For any code generation (and macros are also code generation), situations inevitably arise when a comma appears at the end of the list of any objects. According to the mind, programming languages should consider this situation to be normal, but unfortunately most languages (including C) regard this as an error. Therefore, they came up with crutches - a special syntax ## __ VA_ARGS__, which removes the comma in the event that the package of arguments is empty.

Lightweight rules for wrapping lines in the preprocessor

The preprocessor itself is a very ugly and dangerous thing (which I regularly mention in comments to various articles). but once it is, it is quite logical to alleviate some strict requirements. In particular, the preprocessor in C for implementing multi-line macros uses a very strange and silly syntax with backslashes. This extension allows the presence of whitespace characters after backslashes (the characters are invisible, it is easy to accidentally enter them in the process of editing the code and not notice them).

Indexing non-lvalue arrays

Now it seems obvious, but in C90 for some reason it was impossible to index non-lvalue arrays. Fortunately, in both C99 and C ++ this is possible.

Arithmetic with void * pointers and function pointers

Arithmetic operations on such pointers are allowed. The size of addressable objects is assumed to be 1 byte (but a strange consequence follows: sizeof (void) and sizeof are 1 of functional type ... which is not good).

Pointers to arrays with qualifiers

Subtleties and differences from the standard of implementation of work with pointers to arrays with qualifiers (const and others) in GCC C.

Not constant initializers

Obvious thing, but according to the standard it is impossible to use non-constant objects in initialization lists (in curly brackets). This extension opens such an opportunity:

 foo (float f, float g) { float beat_freqs[2] = { fg, f+g }; /* ... */ }

Compound Literals

One more obvious thing, to which everyone is approaching from different sides, but in no way can they realize it completely, irrevocably and correctly (which is the most important thing). Compound literals that can be used as objects of arrays, functions, and unions — not only for initialization, but also simply in code — for assignment, passing as arguments to functions.

 obj = ((struct foo) {x + y, 'a', 0}); char **tbl = (char *[]) { "x", "y", "z" };

For such literals, temporary objects of the appropriate type are created, which participate in expressions; therefore, for example, it is possible (it would seem impossible, because the constant is not lvalue):

 int i = ++(int) { 1 };

Designated (designated) elements in initialization lists

And one more beautiful extension of the initialization lists - inside the lists, you can specify not only all elements in a row, but also specific elements using the syntax of designators. For arrays, these are unary square brackets in which the index of the element is indicated. So.

 int a[6] = { [4] = 29, [2] = 15 };

equivalent to:

 int a[6] = { 0, 0, 15, 0, 29, 0 };

You can use ranges:

 int widths[] = { [0 ... 9] = 1, [10 ... 99] = 2, [100] = 3 };

For structures, a similar syntax with a unary point is used:

 struct point p = { .y = yvalue, .x = xvalue };

You can mix both types of designators, and in the same initialization list you can use both designators and just elements:

 struct point ptarray[10] = { [2].y = yv2, {33,44}, [2].x = xv2, [0].x = xv0 };

By the way, this extension is not implemented in C ++ and was never dragged into the standard. A pity, this is one of the most beautiful extensions, and one of the things that is now in C and not in C ++.

Ranges in case

The ability to use ranges (with a triple-point) in a switch statement as case arguments:

 switch(c) { case 'A' ... 'Z': /* ... */ break; }

funny, but the GCC authors recommend to surround the ellipsis with spaces, referring to the fact that otherwise there may be problems with parsing of integers (they are probably afraid that the numbers will be recognized as real). With proper parsing this should not be, longer operators should take precedence over short ones starting with the same characters. Anyway.

Reduction to the type of union of any object that is a member of the union.

If there is an association:

 union foo { int i; double d; };

That can be done by explicitly casting the type of objects int and double to type foo:

  union foo u; int x; double y; u = (union foo) x; u = (union foo) y;

Similarly, when passing arguments to a function:

 void hack (union foo); hack ((union foo) x);

Mix declaration variables and code

The most familiar thing in C ++ in C90 is also an extension (it was included in the standard in C99).

Attributes of functions, variables, types, labels, enumerations, control statements

The special keyword __ attribute __ , which allows you to assign attributes (meta information) defined by the compiler to various language constructs. After the keyword in parentheses indicates the name of the attribute. Attributes can be very different. Some attributes are common, others are specific to a particular architecture. Attributes may also have arguments, which are indicated in parentheses after the attribute name. Here are some attributes (in fact, there are a lot of them, and perhaps this topic is worthy of a separate article).

Attributes of functions:

noreturn, - a fukntion never returns control,
pure - a function without side effects (the value depends only on the arguments),
format - has arguments in the style of the printf format string;

Tag Attributes:

unused - the label is not used for transition using goto.
hot —
cold —

deprecated — ,

fallthrough — switch/case break, break.

aligned (N) —

, , ( , ).

C++

, — , . , .

( ), — , .

Escape

'\e' . , , POSIX.

The __alignof__ keyword returns the alignment required for a field in some type or just for some type. Alignment 1 - byte boundary (the lowest possible), 2 - by word boundary, 4 - by double word boundary, etc.

inline functions

This is a well-known C ++ feature transferred in C.

Using volatile

Some features of using volatile in GCC. From curious - if in the code there is such:

 volatile int *ptr; /*...*/ *ptr;

then GCC interprets this as reading from the memory pointed to by ptr and generates the corresponding code

Using assembly inserts

GCC; , , - , . , — ; , . GCC , .

- . __const__, __asm__, ..

enum ; , . , .

. __FUNCTION__ ( __func__ ) __PRETTY_FUNCTION__ , . __PRETTY_FUNCTION__ — .

. (built-in') ( , ..), .

( SIMD — single instruction, multiple data).

( vector_size) . , . .

 typedef int v4si __attribute__ ((vector_size (16))); v4si a = {1,2,3,4}; v4si b = {3,2,1,4}; v4si c; a = b + 1; // a = b + {1,1,1,1}; a = 2 * b; // a = {2,2,2,2} * b; c = a > b; // c = {0, 0,-1, 0} c = a == b; // c = {0,-1, 0,-1}

__builtin_shuffle :

  v4si a = {1,2,3,4}; v4si b = {5,6,7,8}; v4si mask1 = {0,1,1,3}; v4si mask2 = {0,4,2,5}; v4si res; res = __builtin_shuffle (a, mask1); /* res is {1,2,2,4} */ res = __builtin_shuffle (a, b, mask2); /* res is {1,5,3,6} */

offsetof

offsetof , , :

 offsetof(s, m) (size_t)&(((s *)0)-›m)

, (- ) ; offsetof __builtin_offsetof

(builtins)

This concept is rarely distinguished as an independent entity - but in vain. Built-in functions occupy an intermediate place between the keywords of a language and ordinary functions and are used everywhere, and most programmers do not even think about their nature.

For example sine sin () . , ( ) FPU, ( FPU , ). (builtin) , , . , , Cilk Plus, , ..

Pragmas are directives intended, in general, to finely control the compilation process directly from source; they can be attributed both to the preprocessor and to the language itself (in fact, it is difficult for me to attribute them somewhere unequivocally, and the preprocessor has long merged with the language). GCC supports both general purpose pragmas and specific platforms. The topic is big and interesting, as well as builtins, so maybe it will be discussed in a separate part.

Unnamed fields of structures and associations

In structures and unions you can declare nested unnamed structures and unions. The fields of these nested structures and associations will be directly accessible:

 struct { int a; union { int b; float c; }; int d; } foo; foo.b = 10;

, ; , , .

Plan9 ("-fplan9-extensions"), , Go: (embedding) , — , , ++, , ( ).

 typedef struct { int a; } s1; //     typedef' typedef struct { int x; s1; int y; } s2; s2 obj; obj.a = 10; //

Thread-Local

, thread-local storage. TLS, .

, . . '0b'.

, '0o' . .

gcc ++

volatile

volatile GCC C++, .

( C99)

restrict , , . , , . .

GCC restrict this.

«» (vague)

Some constructions in C ++ require space in the object files and can appear simultaneously in several translation units. These are inline functions, virtual function tables (VTables), type_info objects, and template instantiation results. GCC supports the placement of such objects in the COMDAT section of the object file, which makes it possible to eliminate duplicate objects at the linking stage.

Interface and implementation pragmas

Such pragmas allow you to explicitly tell the compiler whether an object is an interface or an implementation. An additional crutch to "indefinite linking".

Instantiation of templates

Pattern instantiation methods in GCC. Methods to ensure that only one copy of each template instance is generated for specific template parameters. The topic is big, I will only mention here

Retrieving a function pointer from a pointer to a class member function

The obvious expansion of capabilities associated with the '-> *' and '. *' Operations. If a pointer to a class field at a low level is a byte offset of this field inside a class, then a method pointer is a full function pointer, and GCC adds the ability to cast the type of method pointer to a regular function pointer.

C ++ attributes

Some attributes (set via the __attribute__ keyword) are applicable only to C ++. A few examples: abi_tag - a way of specifying the mangling of variable and function names; init_priority is the initialization priority for global objects.

Declaring multiple versions of a function

. target — ( , ..). - , , .

inline namespace ( GCC ).

(Type Traits)

, . , ( , , D). — :

 __is_abstract (type) __is_base_of (base_type, derived_type) __is_class (type) __is_empty (type) __is_enum (type) __is_literal_type (type)

++

, (++17) . (.. ), .

Also interesting. , .

G++ void* .
'<?', '>?', '<?=' '>?=' ( , )
( )
new
float complex ( )
implicit typename extension ( )
, typedef' ,
. , ?
; .

backward compatibility

++ . .

, for, ; -fpermissive .
extern «C»; .

As you can see, some extensions can hardly be called extensions: these are either well-known features, or - even worse - crutches designed to ensure compatibility with some ancient and inherited standards, circumvent unsuccessful solutions in language design, etc.

At the same time, others are truly pearls among language features, and it is a pity that they are not included in the standard.

Source: https://habr.com/ru/post/315676/

All Articles