C ++ language traps

It would be nice to make a series of articles that described various non-obvious "features" of programming languages. First, “forewarned is forearmed,” secondly, their knowledge allows for a deeper understanding of the language and for explaining, if anything, than they are dangerous. Even if such constructions are not used in your own, you can meet these traps when parsing someone else's code or working in a team.

So let it be C ++ and type char .

The main sources of problems are:

the absence of a specialized integer type in the language for 8-bit values. Because of this, char has to take on the role of byte;
the presence in the language of two completely different kinds of strings - std::string (“C ++ strings”) and const char* (C strings, which must be maintained for compatibility).

')
More on the first item. Since there is no “native type” byte , it is constructed through the type char . The Standard made a special clarification that the types char , signed char and unsigned char are three different types. Other integral types do not have this property, for example, int and signed int are identical definitions. An additional rake here is the fact that the type char itself must be either signed or unsigned - it depends on the platform (roughly speaking, the compiler and its key set). But at the same time, the compiler is still obliged to distinguish them all from each other.

Accordingly, the following definition:

 void foo(char c); void foo(signed char c); void foo(unsigned char c);

announces three different functions. The problem where the "mixing" of various properties of these types occurs can be shown, for example, with such a piece of code:

 #include <iostream> #include <stdint.h> int main() { uint8_t b = 42; std::cout << b << std::endl; //   *,    42. }

Summarizing: in some situations, a “byte” integer type can manifest its “symbolic” essence with unobvious consequences.

We turn to the second paragraph. In C, there is no special type for working with strings. By convention, it is assumed that if we have a char* (or const char* ) pointer, then this is most likely the string, and it can be passed to the corresponding functions. Plain C allows even such amazing things as, for example:

 int main(void) { char* ptr = "hello"; //   C,     ptr[1] = 'q'; //    "abcd"[1] = '2'; //    - ,     - read-only,           return 0; }

The good news is that in C ++ this feature was not transferred.

But the rest have not gone away. For example, string literal allows for the presence of null characters inside itself (for example, "abc\0\123" ), and functions that are designed to work with them ( strlen , etc.) do not support such strings. That is, due to the decision that “all lines are a sequence of non-zero characters ending in zero” immediately got a situation that not all the lines made it possible to work and funny effects like complexity O (n) for such an operation as “get the length given lines. "

Further, since the compiler automatically adds '\0' for all string literal, this leads to the following consequences:

 char str1[] = "1234"; //    5 ,   4 char str2[4] = "1234"; // ,       char str3[4] = {'1', '2', '3', '4'}; // ...

It would seem that all is well. But the last line contains hidden rakes - it looks like a regular char* , that is, you can pass it into puts , strlen , etc. and get undefined behavior.

Summarizing: whenever possible, avoid using strings in your C ++ programs in the “old” C-style.

Source: https://habr.com/ru/post/154033/

All Articles

C ++ language traps

More articles: