📜 ⬆️ ⬇️

Tips on how to write in C in 2016


If language C was a weapon

From the author: The outline for this article appeared at the beginning of 2015, however, it never came to the publication of materials. Finally, having decided that in the drawer of my desk from the aforementioned “draft” there will be no benefit, I present it to your attention in its original form. The only thing that has changed in the text is the year, from 2015 to 2016.

And I am always happy to hear comments on the necessary corrections, clarifications or even your complaints.
')
So, the article ...

The first rule of C programming - do not use it if you can do with other tools.

When it is not possible to find an alternative method, it's time to recall the modern commandments of the programmer.

The C programming language has been around since the early 1970s. Experts had to “study C” at different stages of its evolution, and closer acquaintance often led to a dead end. So different programmers had their own idea of ​​the world C, due to the first experience of using algorithms of a given language.

Faced with C programming, it is very important not to get stuck at the level of “truths learned in the 80s / 90s”.

If you are reading this article, most likely you are working on modern platforms, adhere to current standards and I do not need to refer to an infinite number of conventions for old software. It makes no sense to perpetuate ancient standards only because individual companies have not bothered to update the system, which, at best, 20 years.

Introduction


Standard C99 (here C99 is “Programming Standard for Since 1999”; C11 - “Programming Standard for Since 2011”, which means 11> 99).

clang, default


Optimization


-O2, -O3.

Usually, -O2 fits you, but sometimes you need -O3 Test both versions (including for different compilers), and then save the most efficient executable files.
- Os

-Os helps out when there are issues with cache performance (and this is no accident).

Warnings


-Wall -Wextra -pedantic
The latest versions of compilers offer the -Wpedantic option, although you can also refer to the ancient -pedantic if necessary, in particular, to enhance the possibilities of backward compatibility.

During the testing phase, add -Werror and -Wshadow for all platforms.

Appealing to -Werror can make the programming process somewhat difficult, since different platforms, compilers and libraries can issue some warnings. I do not think that you want to neglect the development of the customer just because his version of GCC on a platform with which you have not come across before, attacks with new and new malicious notifications.

Additional fun options include Wstrict-overflow -fno-strict-aliasing .

Either you enable -fno-strict-aliasing , or you can work with objects exclusively in the form in which they were created. Since C programming involves the use of different pseudonyms, it is better to choose -fno-strict-aliasing , unless it is a question of the need to control the entire source tree.

To prevent Clang from sending warnings that you use, yes, yes, the appropriate syntax, just add -Wno-missing-field-initializers .

in GCC 4.7.0 and later, this strange warning has been eliminated.

Development


Compilation units

For the development of projects in C, most often they simply select it in each source file - an object file, and then assemble the resulting objects into one. This scheme is perfect for phased development, but it can hardly be called optimal when it comes to performance and optimization. With this approach, your compiler does not recognize the need for optimization by analyzing many object files.

LTO - Link Time Optimization

LTO performs “analysis and optimization of the source as part of problems with compilation units”, creating annotations for object files in the form of intermediate notes, which makes it possible to make appropriate adjustments to the source data in the process of merging objects.

LTO can significantly slow down the merge process. Rescues make -j , but only if the development consists of independent, not related to each other end-users (.a, .so, .dylib, executable test files, executable applications, etc.).

clang lto .
GCC LTO .

By 2016, clang gcc took care of creating an auxiliary LTO , which you can take advantage of by adding -flto to the list of commands when compiling objects and final merging of library / program elements. However, LTO still needs an eye and an eye. Sometimes, if the program uses code that is not run directly, but through additional libraries, LTO can eliminate the corresponding functions or code, because during the general analysis, the utility detects that they are not used, which means that they are not needed in the final version of the product.

Arch
-march=native

Let the compiler use all the functions of your processor and remember: performance testing and regression testing are important (followed by a comparative analysis of the results for different compilers and / or their versions), because with their help you can make sure that optimization elements do not have negative side effects.

-msse2 -msse4.2 may be needed if you are working with options prepared by other developers.

Code creation


Types

If you find something like char, int, short, long unsigned in the new code, here are some bugs.
In modern programs it is necessary to specify #include <stdint.h> and only then choose standard data types.
Detailed descriptions you will find here: stdint.h specification.
Among the most common standard data types are the following:


Please note: no more char . Usually, in the C programming language, the char command is not only named, but also used incorrectly.

Software developers continually use the char command to refer to "byte", even when unsigned byte operations are performed. Much more correctly for individual unsigned byte / octet values ​​specify uint8_t , and for a sequence of unsigned byte / octet values ​​select uint8_t *.

Should I invoke int

Some of our readers admit that they just love int , which their cold frozen fingers will tell you. It is worth noting that it is technically impossible to program correctly if the sizes of data types change as they like.

Also read the rationale voiced during the discussion of inttypes.h : it makes it clear here why it is unsafe to use types of non-fixed width. If you have already noticed that in the development process on individual platforms int 16-bit, on others - 32-bit, and also tested the problem areas on 16 and 32 bits for each use case of int , you can continue in the same vein.

The rest, who have not yet mastered the wisdom of retaining in their head entire complexes of technical conditions for platforms with a multilevel structure when performing a regular puzzle, I advise you to focus on fixed width types, which will automatically allow you to write more correct code with significantly fewer conceptual errors, for which testing additional efforts will be required. Or, as the description briefly states: “The ISO C rule for promoting standard integer data may lead to completely unexpected changes.”

No luck here is not enough.

Exception to the rule “never use char ”

The only case where the char command can be accessed in 2016 is if the selected API requests a char (for example, strncat, printf'ing "%s", ... ) or if you specify strings for reading only (for example, const char *hello = "hello"; ), because in the C programming language, string literals ("hello") look like char [].
EXCEPT TOGO: C11 provides support for native unicode, and for UTF-8 string literals char is still used, even if you have to work with multibyte sequences like const char *abcgrr = u8"abc"; .

Exception to the rule “never use {int,long,etc} ”

If you access functions with result types or native parameters, use the types according to the function class or API characteristics.

Signedness

Do not try to use unsigned in your code. Now you know how to write a decent code without the unreasonable conventions of C with numerous data types that not only make the content unreadable, but also call into question the effectiveness of using the finished product. Who would like to introduce unsigned long long int if you can restrict uint64_t to a simple uint64_t ? Files of the type <stdint.h> are much more specific and precise in meaning, they better convey the intentions of the author, are compact - which is important for operation and readability.

Integer pointers

Perhaps one of you will argue: "But what about without pointers for a long , without them all the mathematics will be covered!"

Of course, you can say this, but who says that the statement is true?

The correct type for pointers in this case is uintptr_t , it is specified by the files <stdint.h> . At the same time, it is important to note that the very useful ptrdiff_t is defined by stddef.h .

Instead:
long diff = (long)ptrOld - (long)ptrNew;

Use:
ptrdiff_t diff = (uintptr_t)ptrOld - (uintptr_t)ptrNew;

And:
printf("%p is unaligned by %" PRIuPTR " bytes.\n", (void *)p, ((uintptr_t)somePtr & (sizeof(void *) - 1)));

System-dependent data types


You are still arguing that “on a 32-bit platform I need 32-bit long , and on the 64th platform, 64-bit!”.

If you omit the reasoning during which you obviously find it difficult to explain the reason for using two different sizes in the code depending on the platform, I think that in the end you still don’t want to dwell on long oriented system-dependent data types.

In such situations, it is reasonable to refer to intptr_t , an integer data type responsible for storing the pointer value for your platform.

On modern 32-bit platforms, intptr_t transformed to int32_t .

On modern 64-bit platforms, intptr_t takes the form int64_t .

Also intptr_t is found in the uintptr_t variant.

To store information about the pointer offset, use ptrdiff_t — it is this data type that allows you to memorize the parameters of the subtracted pointers.

Maximum Value


Are you looking for an integer data type capable of processing any integer values ​​in your system?

As a rule, programmers prefer the most well-known alternatives, in particular, the unsightly uint64_t , and in fact there is a more efficient technical solution, thanks to which any variable can be used to store all sorts of values. Safe storage of integer data is guaranteed by intmax_t (or uintmax_t ). You can entrust any intmax_t value intmax_t , being sure that the accuracy of the data will not be affected by this. Similarly with unsigned integers delegated by uintmax_t .

Other data type


If we are talking about common system-dependent data types, size_t , guaranteed stddef.h takes first place in the list of favorites.

In essence, size_t is something like “an integer value capable of storing huge array indices”, which means it can capture impressive indicators of bias in the program being created.

In practice, size_t acts as a result type for the sizeof operator.

In any case, on modern platforms, size_t has practically the same characteristics as uintptr_t , and therefore, on 32-bit versions, size_t transformed into uint32_t , and on 64-bit uint64_t - into uint64_t .

There is also ssize_t , which is a signed size_t , used as a result type for library functions — in the event of an error, we get 1. (Note: ssize_t belongs to the POSIX package and is not suitable for Windows).

So is it worth using size_t for arbitrary system-specific sizes, setting the parameters of your own functions? Technically, size_t is the result type of sizeof , so any functions that determine the size of a value in the form of a specific number of bytes can take the form size_t .

Other uses are: size_t is the argument type for malloc, and ssize_t is the result type for read() and write() (except for Windows interfaces, in which ssize_t not provided and only int is used for result values).

Types of data output to print (Printing Types)



Do not refer to data types during printing. Always use the appropriate type pointers as advised on inttypes.h .

This list includes (of course, this is only a brief excerpt):


64-bit data types are printed using only the PRI [udixXo] 64 style macro.
Why?

On some platforms, 64-bit values ​​are represented by the long function, on others - long long . These macros provide optimal basic format characteristics for various platforms.

Without these format macros, it is practically impossible to create a formatting string suitable simultaneously for all platforms, since data types change, regardless of your actions (and remember, setting the above values ​​before printing starts is not safe, but illogical).

intptr_t - "%" PRIdPTR
uintptr_t - "%" PRIuPTR
intmax_t - "%" PRIdMAX
uintmax_t -% PRIUMAX

One addition to the PRI * format specifiers: these are macros, and, depending on the specific platform, they expand to the appropriate printf class specifiers. And, therefore, you can not specify:

printf("Local number: %PRIdPTR\n\n", someIntPtr);

Instead, knowing that we are dealing with macros, we write:

printf("Local number: %" PRIdPTR "\n\n", someIntPtr);

Note:% falls into the body of a formatting string literal, while the type pointer remains outside of it, since all adjacent lines are combined by the preprocessor in one final combined string literal.

C99 allows you to use variable descriptions anywhere.

We do NOT do this:
 void test(uint8_t input) { uint32_t b; if (input > 3) { return; } b = input; } 


Instead, we write as follows:
 void test(uint8_t input) { if (input > 3) { return; } uint32_t b = input; } 


Warning: if program cycles are limited, check the position of the initializers. Sometimes unsystematic descriptions lead to an unexpected decrease in the speed of work. For normal, not accelerated, code (which, in fact, is used in most cases) it is best to focus on clarity. So, by defining data types immediately after completing work on initializers, you will noticeably increase readability.

In C99, you can use for loops to create embedded descriptions of the counters.

Never write:
 uint32_t i; for (i = 0; i < 10; i++) 


It will be right:
 for (uint32_t i = 0; i < 10; i++) 


One exception: if you want to save the value of your counter after exiting the cycle, of course, you should not insert the corresponding description into the body of the cycle.

Modern compilers support #pragma once.

WRONG option:
 #ifndef PROJECT_HEADERNAME #define PROJECT_HEADERNAME . . . #endif /* PROJECT_HEADERNAME */ 


Instead, use
#pragma once

#pragma once notifies the compiler to request a header only once, therefore, you no longer have to write additional lines to protect it. This function is supported by all compilers, and on different platforms, and is much more efficient mechanism than manually entering the header code.
A detailed description of the option can be found in the list of compilers that support pragma once.

The C programming language allows for static initialization of automatically created arrays.

So, we do not write:
  uint32_t numbers[64]; memset(numbers, 0, sizeof(numbers)); 


It will be right:
  uint32_t numbers[64] = {0}; 


Working on C, you can perform static initialization of automatically generated structures.

Classic error:
  struct thing { uint64_t index; uint32_t counter; }; struct thing localThing; void initThing(void) { memset(&localThing, 0, sizeof(localThing)); } 


Correctly:
  struct thing { uint64_t index; uint32_t counter; }; struct thing localThing = {0}; 


IMPORTANT: If internal alignment is provided in your structure, the {0} method will not clear the extra bytes destined for this purpose. So, for example, it happens if a struct thing has 4 bytes of padding after counter (on a 64-bit platform), because structures are filled in increments equal to one word. If you need to zero the entire structure including unused bytes of indents, specify memset(&localThing, 0, sizeof(localThing)) , since sizeof (localThing) == 16 bytes, even though only 8 + 4 = 12 bytes are available.

If you need to reinitialize previously selected structures, use a common null structure to determine the values:
  struct thing { uint64_t index; uint32_t counter; }; static const struct thing localThingNull = {0}; . . . struct thing localThing = {.counter = 3}; . . . localThing = localThingNull; 


If you're lucky enough to work on C99 (or later versions), you can choose composite literals instead of messing around with the basic “zero structure” (see The New C: Compound Literals 2001 ).

Compound literals allow the compiler to automatically create temporary anonymous structures, and then copy them into the appropriate value field:
localThing = (struct thing){0};

In C99, arrays of variable length appeared (in C11, they can be chosen at will).

Therefore, do NOT write like this (if you are dealing with a miniature array or just conduct rapid testing):
 uintmax_t arrayLength = strtoumax(argv[1], NULL, 10); void *array[]; array = malloc(sizeof(*array) * arrayLength); / *    ()     * / 


Instead, we specify:
  uintmax_t arrayLength = strtoumax(argv[1], NULL, 10); void *array[arrayLength]; /*     */ 


IMPORTANT: variable-length arrays (as a rule) are created on the stack, just like regular arrays. If you cannot create a regular array of 3 million elements statically, do not try to generate a dynamic array of the same size using this syntax. These are not scalable Python / Ruby automatic lists. If you specify the length of the array during program startup and it is too large for your stack, a mess will start (malfunctions, security problems). Variable length arrays are ideal for individual situations designed to perform specific tasks, but should not be used to develop all types of software. If once you need to generate an array of 3 elements, and the other - 3 million, it is hardly worth resorting to using variable-length arrays.

Yes, it's a good idea to understand the syntax of the VLA, knowing that it may be useful to you (or if you need to carry out a one-time express test of a product). At the same time, such undertakings often turn into tragedies when entire programs crash, one has only to forget the exact parameters for checking the size of an element or to lose sight of the fact that you are faced with an unfamiliar target platform on which no additional stack space is provided.

: , arrayLength – ( ; 4 ). ( ), , , , 99 VLA, malloc .

: , , , VLA. - VLA , , , , , .

C99 .


, .

:
 void processAddBytesOverflow(uint8_t *bytes, uint32_t len) { for (uint32_t i = 0; i < len; i++) { bytes[0] += bytes[i]; } } 

:
 void processAddBytesOverflow(void *input, uint32_t len) { uint8_t *bytes = input; for (uint32_t i = 0; i < len; i++) { bytes[0] += bytes[i]; } } 


, . « », uint8_t . , , , char * , - . , void * , , , , , .

, , , . , , . - Unaligned Memory Access (: , , , ).


C99 <stdbool.h> , true 1, false — 0.
/ true or false, int32_t , 1 0 (, , 1 -1; : 0 – success, 1 — failure? 0 – success, -1 — failure?).

, , , API , , . , , , « , ».

:
 void *growthOptional(void *grow, size_t currentLen, size_t newLen) { if (newLen > currentLen) { void *newGrow = realloc(grow, newLen); if (newGrow) { /*    */ grow = newGrow; } else { /*    ,     ,      */ free(grow); grow = NULL; } } return grow; } 



:
 /*  : * - 'true'  newLen > currentLen     * -    'true'     ,      '*_grow' * - 'false'  newLen <= currentLen */ bool growthOptional(void **_grow, size_t currentLen, size_t newLen) { void *grow = *_grow; if (newLen > currentLen) { void *newGrow = realloc(grow, newLen); if (newGrow) { /*    */ *_grow = newGrow; return true; } /*     */ free(grow); *_grow = NULL; /*   , * 'true'     ,       */ return true; } return false; } 


, , :
 typedef enum growthResult { GROWTH_RESULT_SUCCESS = 1, GROWTH_RESULT_FAILURE_GROW_NOT_NECESSARY, GROWTH_RESULT_FAILURE_ALLOCATION_FAILED } growthResult; growthResult growthOptional(void **_grow, size_t currentLen, size_t newLen) { void *grow = *_grow; if (newLen > currentLen) { void *newGrow = realloc(grow, newLen); if (newGrow) { /*    */ *_grow = newGrow; return GROWTH_RESULT_SUCCESS; } /*    ,   ,          */ return GROWTH_RESULT_FAILURE_ALLOCATION_FAILED; } return GROWTH_RESULT_FAILURE_GROW_NOT_NECESSARY; } 



, .

50 , - . , .

– .

, 2016 , , — clang-format. clang-format C-. , .

clang-format:
#!/usr/bin/env bash

clang-format -style="{BasedOnStyle: llvm, IndentWidth: 4, AllowShortFunctionsOnASingleLine: None, KeepEmptyLinesAtTheStartOfBlocks: false}" "$@"

( cleanup-format ):
matt@foo:~/repos/badcode% cleanup-format -i *.{c,h,cc,cpp,hpp,cxx}

-i , .

, :
 #!/usr/bin/env bash #  : clang-tidy      ,       #    . find . \( -name \*.c -or -name \*.cpp -or -name \*.cc \) |xargs -n1 -P4 cleanup-tidy # clang-format     ,      12 #  ()    . find . \( -name \*.c -or -name \*.cpp -or -name \*.cc -or -name \*.h \) |xargs -n12 -P4 cleanup-format -i 


, cleanup-tidy. , , :
 #!/usr/bin/env bash clang-tidy \ -fix \ -fix-errors \ -header-filter=.* \ --checks=readability-braces-around-statements,misc-macro-parentheses \ $1 \ -- -I. 


clang-tidy — . :

readability-braces-around-statements – if/while/for ;

, « » . . , , « !», – , - . , , , , , .

misc-macro-parentheses – , .

clang-tidy – , , , , . , clang-tidy — , clang-format , .


, , …

Comments


.


1000 (1500 ). ( ..), .


malloc

calloc . . calloc(object count, size per object) , #define mycalloc(N) calloc(1, N) .

:



, , , .

, calloc() , :

Benchmarking fun with calloc() and zero pages (2007)
Copy-on-write in virtual memory management

2016 - calloc() ( , 64 , , , ). « » , « », .

: calloc() – , . calloc() realloc() , . . realloc() , memset() .

memset ( )

memset (ptr, 0, len, () ( , ).
memset() — , , ( {0} , ).

Conclusion


, , . , , , , , RAM «».

, — , , - .

Source: https://habr.com/ru/post/275685/


All Articles