📜 ⬆️ ⬇️

Criticism of the article “How to write in C in 2016”


From the translator:

This publication is the third and last article of the cycle, which arose spontaneously after the publication of the translation of the article "How to C in 2016" in the blog Inoventica Services . Here, some of the theses set forth in the original are criticized and a complete “picture” of opinions about the issues raised by the author of the first publication and the methods of writing code in C. is finally formed. ImpureThought was provided to the English-language original, for which a special thanks to him. With the second publication, a tip to the text of which gave, as I think, a CodeRush user familiar to many, can be found here .

Matt (on whose website the author’s last name is not listed, at least as far as I know) published the article “C Programming in 2016”, which later appeared on Reddit and Hacker News, it was on the last resource that I discovered it.
')
Yes, one can endlessly "discuss" C programming, but there are aspects with which I clearly disagree. This critical article is written from the standpoint of constructive discussion. It is quite possible that in some cases Matt is right, but I am mistaken.

I am not quoting Matt's entire publication. In particular, I decided to drop some points with which I agree. Let's start.

The first rule of C programming - do not use it if you can do with other tools.

I do not agree with this statement, but this is too broad a topic for discussion.

When programming in C, the lang defaults to C99, and therefore no additional options are required.

It depends on the clang version: clang 3.5 by default works with C99, clang 3.6 - with C11. I'm not sure how tough it is when using out of the box.

If you need to use a specific standard for gcc or clang, do not complicate, use std = cNN -pedantic.

By default, gcc-5 requests -std=gnu11 , but in practice you need to specify c99 or c11 without GNU.

Well, except that if you do not want to use specific gcc extensions, which, in principle, are quite suitable for these purposes.

If you find something like char , int , short , long or unsigned in the new code, here are some bugs.

You, of course, excuse me, but this is nonsense. In particular, int is the most acceptable type of integer data for the current platform. If we are talking about fast, unsigned integers of at least 16 bits, there is nothing wrong with using int (or you can refer to the int_least16_t option, which will do a great job with functions of the same type, but IMHO is much more detailed than worth it).

In modern programs it is necessary to specify #include <stdint.h> and only then choose standard data types.

The fact that int not spelled «std» does not mean that we are dealing with something non-standard. Types such as int , long and others are built into the C language. And typedefs, recorded in <stdint.h> , appear later as additional information. This does not make them less “standard” than built-in types, although they are in some way inferior to the latter.

float - 32-bit floating point standard
double - 64-bit floating point standard

float and double are quite common IEEE types for 32 and 64-bit floating-point standards, in particular, on modern systems, you should not dwell on this when programming in C. I worked on systems where float was used on 64 bits.

Please note: no more char. Usually in the C programming language, the char command is not only named, but also used incorrectly.

Unfortunately, the merging of parameters and bytes when programming in C is inevitable, and here we are just stuck. The char type is consistently equal to one byte, where the “byte” is at least 8 bits.

Software developers continually use the char command to refer to "byte", even when unsigned byte operations are performed. Much more correctly for individual unsigned byte / octet values ​​specify uint8_t , and for a sequence of unsigned byte / octet values ​​select uint8_t * .

If bytes are implied, enable unsigned char . If talking about octets, select uint8_t . In the case when CHAR_BIT > 8 , uint8_t cannot be created, which means that it will not work and compile the code (perhaps this is what you need). If we work with objects of at least 8 bits, use uint_least8_t . If bytes are octets, add something like this to the code:

 #include <limits.h> #if CHAR_BIT != 8 #error "This program assumes 8-bit bytes" #endif 

Note: POSIX requests CHAR_BIT == 8 .

in the C programming language, string literals ("hello") look like char * .

No, string literals are specified with char []. In particular, for "hello" this is char [6]. Arrays are not pointers.

Do not try to write code using unsigned . Now you know how to write a decent code without the unreasonable conventions of C with numerous data types that not only make the content unreadable, but also call into question the effectiveness of using the finished product.

Many types of C are assigned names consisting of several words. And there is nothing wrong with that. If you are too lazy to type extra characters, this does not mean that it is worth stuffing the code with all sorts of abbreviations.

Who would like to introduce unsigned long long int if you can restrict uint64_t to a simple uint64_t ?

On the one hand, you can use unsigned long long, meaning int. At the same time, knowing that these are different things and that the type is unsigned long long , at least 64-bit, and it may be present or absent indents. uint64_t designed exactly for 64 bits, and without bits of indents; This type is not necessarily registered in this or that code.

unsigned long long embedded type in C. Anyone familiar with this programming language is familiar.

Or try uint_least64_t , which may be identical or different from unsigned long long .

The types <stdint.h> much more specific and precise in meaning, they better convey the intentions of the author, are compact - not least important for exploitation and readability.

Of course, the types intN_t and uintN_t much more specific. But not all codes are important. Do not specify what is unimportant for you. Choose uint64_t only when you really need exactly 64 bits - no more, no less.

Sometimes types with exact length are required, for example, when it is necessary to adapt to a specific format (Sometimes emphasis is placed on the byte order, alignment of elements, etc.). <Stdint.h> in C does not allow for the description of specific parameters). Most often, it is enough to specify a specific range of values, for which the built-in types [u] int_leastN_t or [u] int_leastN_t are suitable.

The correct type for pointers in this case is uintptr_t , it is specified by the files <stdint.h> .

What a terrible mistake.

Let's start with small errors: uintptr_t is set by <stdint.h> , not <stddef.h> .

This, if ever, talk about specifics. Calling a command where void* cannot be converted to another integer type without data loss is unlikely to determine uintptr_t (Such cases are extremely rare, if they exist at all).

Instead:

 long diff = (long)ptrOld - (long)ptrNew; 


Yes, things are not done that way.

Use:

 ptrdiff_t diff = (uintptr_t)ptrOld - (uintptr_t)ptrNew; 


But this option is no better.

If you want to emphasize the difference of types, write:

 ptrdiff_t diff = ptrOld - ptrNew; 

If you need to focus on bytes, choose something like:

 ptrdiff_t diff = (char*)ptrOld - (char*)ptrNew; 

If ptrOld and ptrNew do not indicate the necessary parameters, or simply jump from the end of the object, it will be difficult to track how the pointer causes the command to subtract data. The transition to uintptr_t guarantees at least a relative result, although it can hardly be called very useful. Comparison or other arithmetic operations with pointers are permissible only when writing code for high-level systems, otherwise it is important that the studied pointers refer to the end of a certain object or jump from it (Exception: == and! = Work fine for pointers that refer to different objects).

In such situations, it is rational to refer to intptr_t, an integer data type corresponding to values ​​equal to a word on your platform.

And no. The concept of "equal to the word" is very abstract. intptr_t signed integer type that successfully converts void* to intptr_t and back without losing data. And it can be a value greater than void* .

On 32-bit platforms, intptr_t transformed to int32_t .

It happens, but not always.

On 64-bit platforms, intptr_t takes the form int64_t .

And again, it is likely, but not necessary.

Essentially, size_t is something like an “integer value capable of storing huge array indices.

Nooo.

and, therefore, he is able to fix impressive indicators of bias in the program being created.

Yes, this type of data allows you to save information about the size of the largest object involved when starting the program (there is also an opinion that this is also optional , but for the sake of practice we can assume that this is exactly what happens). It can fix the main memory offset if all offsets are made within the same object.

In any case, on modern platforms, size_t has practically the same characteristics as uintptr_t , and therefore, on 32-bit versions, size_t transformed into uint32_t , and on 64-bit uint64_t - into uint64_t .

Most likely, but not necessarily.

More specifically, size_t can be used to preserve the size of any individual object, while uintptr_t sets any pointer value, and, accordingly, with their help, you no longer confuse the byte addresses of various objects. Most modern systems work with indivisible address lines, and therefore, theoretically, the maximum object size is equal to the total memory capacity. C programming standards require strict compliance with this requirement. For example, you may encounter a situation where on a 64-bit system objects do not exceed 32 bits.

Highlighting the word “modern”, we automatically omit both old alternatives (like x86, on which we used segmented addressing with near and far pointers), and do not touch on possible future products, which may also include compatibility with C standards, although they go beyond the definition of "Modern".

Do not refer to data types during operation. Always use appropriate type pointers.

This is one of the options, but not the only successful solution (And, for sure, you will agree that you still need to mention void * for "% p").

The initial pointer value is% p (in modern compilers it is displayed in hexadecimal; initially sends a pointer to void * )

Excellent advice - only the output format is set by the launch parameters. Usually this is a hexadecimal value, but do not think that there is no other.

  printf("Local number: %" PRIdPTR "\n\n", someIntPtr); 

The name someIntPtr implies the type int* , actually sets the type intptr_t .

There may be variations on the theme, which means that you do not need to learn endless combinations of macro names:

 some_signed_type n; some_unsigned_type u; printf("n = %jd, u = %ju\n", (intmax_t)n, (uintmax_t)u); 

intmax_t and uintmax_t , as a rule, 64-bit. Their transformations are much more economical than physical I / O.

Note:% falls into the body of the format string, while the type pointer remains outside.

All of this is part of the format string. Macros are set as string literals combined with adjacent string literals.

Modern compilers support #pragma once

But no one says that you are obligated to use this directive. Even in the instructions of the processors, such recommendations are not voiced. And in the “Headers with Once” section, not a word about #pragma once; but it is described #ifndef . In the next section, "Alternatives to the #ifndef Packer" flashed #pragma once, but in this case it’s just noted that this is not a portable option.

This function is supported by all compilers, and on different platforms, and is much more efficient mechanism than manually entering the security code for the header.

And who gives such recommendations? The #ifndef directive may not be perfect, but is reliable and portable.

IMPORTANT: If internal structure is provided in your structure, the {0} method will not reset the additional bytes intended for this purpose. So, for example, it happens if a struct thing has 4 bytes of padding after counter (on a 64-bit platform), because structures are filled in increments equal to one word. If you need to zero the entire structure including unused bytes of indents, specify memset(&localThing, 0, sizeof(localThing)) , since sizeof(localThing) == 16 bytes , even though only 8 + 4 = 12 bytes are available.

The task is complicated. Usually there is no reason to pay special attention to the bytes of indents. If you still want to devote your precious time to them, use memset to reset them. Although I will note that clearing structures using memset , even taking into account that whole elements will indeed be assigned a value of zero, does not guarantee the same effect for floating-point types or pointers — they must be equal to 0.0 and NULL , respectively (although most systems function perfectly works).

In C99, variable length arrays appeared

No, initializers for VLA (variable length arrays) are not provided in C99. But Matt, in fact, does not write about the VLA initializers, mentioning only the VLAs themselves.

Variable-length arrays are a controversial phenomenon. Unlike malloc, they do not involve error detection in the allocation of resources. So, if you need to allocate N number of data bytes, you will need:

 { unsigned char *buf = malloc(N); if (buf == NULL) { /* allocation failed */ } /* ... */ free(buf); } 

at least in general, it is safer than:

 { unsigned char buf[N]; /* ... */ } 

Yes, mistakes when using VLA are fraught with serious problems. But the same can be said, practically, about every function in any programming language.

And with the old fixed-length arrays, similar questions arose. As long as you check the size before creating the array, the VLA with variable N is as harmless as an array of fixed length of the same size. As a rule, to describe fixed-length arrays, a value greater than the number of supposed elements is chosen, since part of it is necessary to store actual data. With VLA, you can allocate exactly as much space as the components require. And here I agree with Matt’s recommendation.

In addition to one aspect: in C11, you can select the VLA at will. I doubt that most C11 compilers, in fact, will perceive arrays of variable length as optional, except in the case of small embedded systems. True, this feature is worth remembering if you plan to write the most portable code.

If the function works with * arbitrary ** source data and a certain length, do not limit the type of this parameter. *

Knowingly erroneous:

 void processAddBytesOverflow(uint8_t *bytes, uint32_t len) { for (uint32_t i = 0; i < len; i++) { bytes[0] += bytes[i]; } } 

Instead, use:

 void processAddBytesOverflow(void *input, uint32_t len) { uint8_t *bytes = input; for (uint32_t i = 0; i < len; i++) { bytes[0] += bytes[i]; } } 

I agree, void* an ideal type for fixing parameters of an arbitrary fragment of memory. Take at least the mem* function in the standard library (But len ​​should be size_t , not uint32_t ).

By declaring the source data type as void *, and re-assigning or once again referring to the actual data type that is needed right in the function body, you will protect users, because they don’t have to think about what is happening in your library.

A small note: this is not spelled out in Matt’s function. Here we see the implicit conversion of void* to uint8_t* .

In this example, some readers are faced with the problem of alignment.

And they were wrong. If we work with a specific piece of memory, like a sequence of bytes, it is always safe.

C99 provides us with the whole set of functions <stdbool.h> , where true is 1 and false - 0 .

Yes, and besides, you can set the bool , which is used as an alias for the built-in _Bool type.

In the case of successful / unsuccessful return values, the functions should return true or false , rather than the return type int32_t , which requires manual input of 1 and 0 (or, even worse, 1 and -1; how to figure it out: 0 - success , and 1 - failure? or 0 - success , and -1 - failure? )).

There is a widespread algorithm, in particular, on systems like Unix, when, if successful, the function returns 0, and if it fails, some non-zero value (often -1). In many situations, variable non-zero results indicate different types of errors. Adding new functions to ready-made interfaces, it is important to follow the aforementioned standard (0 is equivalent to success, since, in general, there is only one option for the effective operation of the function, but there may be many errors in it).

The function created for analyzing certain conditions should return true or false . Just do not confuse them with successful / unsuccessful outcomes of running the code.

The function bool necessarily assigned a name in the form of an assertion. In English, this will be the wording that answers the yes / no question. For example, is_foo() and has_widget() function designed for a specific action, in the case with which it is important for you to know how successful it can be performed, is likely to be specified by another statement. In some languages ​​it is reasonable to resort to adding / subtracting exceptions. On C, you have to follow certain unspoken rules, including setting a zero value for a positive result of a function.

The only product that in 2016 will allow formatting products developed in C is clang-format. Native clang-format settings are an order of magnitude higher than any other automatic C-code formatter.

I myself have not used clang-format. I just have to meet him.

But I would like to voice a few key points regarding the formatting of the C-code:


I rarely turn to automatic formatting tools. Maybe nothing?

Never use malloc
Get used to calloc .

Here's another. Attempting to reset all bits of the allocated memory is reduced to a very arbitrary process, and, as a rule, this is not the best idea. If the code is written correctly, you will not be able to call this or that object without first assigning the corresponding value to it. Using calloc , you will encounter the fact that any bug in the code will be equal to zero, which means that it will be easy to confuse a system error with unnecessary data. Does this sound like code enhancement?

Resetting the memory often results in an error in the program code triggering sequential algorithms; By definition, this can not be called the correct course of launch. But the successive errors are much more difficult to track.

Yes, if the code was written without errors. But if you are pursuing a defensive strategy when you create a code, you may want to assign a certain value from the category of invalid memory to the allocated memory.

On the other hand, if zeroing all the bits solves the problem, you can try to use calloc .



PS
We also invite readers next week to visit our cloud data center with a guided tour. .

Source: https://habr.com/ru/post/276611/


All Articles