
We decided to make a short pause in the subject of static code analysis. After all, the C ++ blog is also read by those who are not yet using this technology. In the meantime, phenomena occur in the C ++ world that affect such an “established” theme as the 64-bit world. It is about how the C ++ 11 standard affects and helps (if there is something) in developing the correct 64-bit programs. Today’s article is about that.
64-bit computers have long been successfully used. Most applications have become 64-bit. This allows them to use a larger amount of memory, as well as to obtain a performance boost due to the architectural capabilities of 64-bit processors. Creating 64-bit C / C ++ programs requires attentiveness from the programmer. There are a lot of reasons because of which the code of a 32-bit program refuses to work correctly after recompiling for a 64-bit system. Many articles have been written about this. But now we are interested in another question. Let's see if the use of new features that appeared in C ++ 11 makes it easier to make programmers who create 64-bit programs easier.
The world of 64-bit errors
There are many traps that a programmer can fall into when creating 64-bit C / C ++ applications. A large number of articles have been written about this, so we will not repeat. Those who are not familiar with the nuances of developing 64-bit programs, or those who want to refresh their memory, can recommend the following resources:
Time does not stand still, and now programmers use an updated version of the C ++ language, called C ++ 11. At the moment, most of the innovations described in the C ++ 11 standard are supported by modern compilers. Let's see if these innovations can somehow help the programmer to avoid 64-bit errors.
The article will be structured as follows. A brief description of a typical 64-bit error will be given, and ways to avoid it using C ++ 11 will be proposed. Immediately, we note that C ++ 11 is not always able to help anything. Only accurate programming can protect against errors. A new standard only helps in this, but does not solve all the problems for the programmer.
')
Magic numbers
We are talking about the use of numbers such as 4, 32, 0x7FFFFFFF, 0xFFFFFFFF (
more ). It’s bad if the programmer suggested that the pointer size is always 4 bytes and wrote the following code:
int **array = (int **)malloc(n * 4);
Here the C ++ 11 standard cannot help us. Magic numbers, this is evil and the only way to avoid mistakes is to try not to use them.
Note. Yes, malloc () is not C ++, but good old C. It is much better to use the operator new or the container std :: vector. But now it is irrelevant. Talk about magic numbers.However, C ++ 11 sometimes helps to reduce the number of magic numbers. Some magic numbers in the program appear because of the fear (often unreasonable) that the compiler poorly optimizes the code. In this case, you should pay attention to the generalized constant expressions (onstexpr).
The
constexpr mechanism guarantees the initialization of expressions at compile time. In this case, you can declare functions that are guaranteed to unfold into a constant at the compilation stage. Example:
constexpr int Formula(int a) { constexpr int tmp = a * 2; return tmp + 55; } int n = Formula(1);
The call to Formula (1) will turn into a number. The explanation is of course too short. More information about "constexpr" and other innovations can be read by clicking on the links at the end of the article.
Functions with variable number of arguments
We are talking about the misuse of functions such as printf, scanf (
more ). Example:
size_t value = ....; printf("%u", value);
This code works correctly in a 32-bit program, but it can print out incorrect values ​​when the program turns into a 64-bit one.
Functions with a variable number of arguments - a relic of the C language. Their disadvantage is the lack of control over the types of actual arguments. In C ++, it's time to abandon them. There are lots of other ways to format strings. For example, you can replace printf with cout, and sprintf with boost :: format or std :: stringstream.
With C ++ 11, life has become even better. In C ++ 11, templates with a variable number of parameters (Variadic Templates) appeared. This allows you to implement such a safe version of the printf function:
void printf(const char* s) { while (s && *s) { if (*s=='%' && *++s!='%') throw runtime_error("invalid format: missing arguments"); std::cout << *s++; } } template<typename T, typename... Args> void printf(const char* s, T value, Args... args) { while (s && *s) { if (*s=='%' && *++s!='%') { std::cout << value; return printf(++s, args...); } std::cout << *s++; } }
This code simply “gets” the first argument, which is not a format string, and then calls itself recursively. When there are no such arguments, the first (simpler) version of the printf () method will be called.
Type Args ... defines the so-called "parameter group" ("parameter pack"). In essence, this is a sequence of type / value pairs, from which you can “get” the arguments, starting with the first one. When the printf () function is called with one argument, the first method (printf (const char *)) will be selected. When calling the printf () function with two or more arguments, the second method will be selected (printf (const char *, T value, Args ... args)), with the first parameter s, the second one - value, and the remaining parameters (if any) will be Packed in a group of parameters args, for later use. When calling:
printf(++s, args...);
The parameter group args is shifted by one, and the next parameter can be processed as value. And so it goes until args is empty (and the first version of the printf () method will be called).
Incorrect shift operations
Numeric literal 1 is of type int. It means that it cannot be shifted by more than 31 bits (
more ). This is often forgotten, and in programs you can find this code:
ptrdiff_t mask = 1 << bitNum;
If the bitNum value is equal, say 40, the result will be unpredictable. Formally, this will result in an undefined behavior (
more ).
Can C ++ 11 help us? Unfortunately, nothing.
Desynchronization of virtual functions
Let the virtual class be declared in the base class:
int A(DWORD_PTR x);
And in the class of the heir there is a function:
int A(DWORD x);
In a 32-bit program, the DWORD_PTR and DWORD types are the same. However, in a 64-bit program, these are already two different types (
more ). As a result, calling the function A from the base class will produce different results in a 32-bit and 64-bit program.
To deal with such errors can help new keywords, which appeared in C ++ 11.
Now we have the
override keyword, which allows the programmer to explicitly express his intentions about redefining functions. The declaration of a function with the override keyword is valid only if there is a function to override.
This code will not compile in 64-bit mode and thus the error will be cleared:
struct X { virtual int A(DWORD_PTR) { return 1; } }; struct Y : public X { int A(DWORD x) override { return 2; } };
Mixed arithmetic
This is quite an important and extensive topic. I propose to get acquainted with the corresponding groundwork of "64-bit lessons":
Mixed arithmetic .
Quite briefly:
- Programmers often forget that the result of multiplying and adding two variables of the 'int' type is also of the 'int' type. An overflow may occur. And it does not matter how the result of multiplication and addition is used.
- It is dangerous to mix 32-bit and 64-bit data types. Consequences: abnormal conditions, eternal cycles.
Let's look at some simple examples of overflow.
char *p = new char[1024*1024*1024*5];
The programmer is trying to allocate an array of 5 gigabytes of memory, but allocates much less. The fact is that the expression "1024 * 1024 * 1024 * 5" is of type int. As a result, an overflow will occur, and the expression will be 1073741824 (1 gigabyte). Then, when passing to the 'new' operator, the number 1073741824 will be expanded to the size_t type, but that does not matter (it's too late).
If the problem is not clear, then here is another similar example:
unsigned a = 1024, b = 1024, c = 1024, d = 5; size_t n = a * b * c * d;
The result of the expression is placed in a variable of type 'size_t'. It is capable of storing values ​​larger than UINT_MAX. But when multiplying variables of the 'unsigned' type, an overflow occurs and the result will be incorrect.
Why do we call all this 64-bit errors? The fact is that in a 32-bit program it is impossible to allocate an array of more than 2 GB in size. So, overflows simply do not arise. Such errors manifest themselves only in 64-bit programs when they start working with large amounts of memory.
Now a couple of examples about the comparison
size_t Count = BigValue; for (unsigned Index = 0; Index < Count; ++Index) { ... }
This is an example of an eternal loop if Count> UINT_MAX. Suppose that on 32-bit systems this code performed less repetition, less than UINT_MAX times. But the 64-bit version of the program can process more data, and it may need more iterations. Since the values ​​of the Index variable lie in the range [0..UINT_MAX], the condition "Index <Count" is always satisfied, which leads to an infinite loop.
One more example:
string str = .....; unsigned n = str.find("ABC"); if (n != string::npos)
This code is incorrect. The find () function returns a value of type string :: size_type. Everything will work fine in a 32-bit system. But let's see what happens in the 64-bit program.
In the 64-bit program, string :: size_type and unsigned stop coinciding. If the substring is not found, the find () function returns the value string :: npos, which is 0xFFFFFFFFFFFFFFFFui64. This value is trimmed to a value of 0xFFFFFFFFu and placed in a 32-bit variable. The expression is calculated: 0xFFFFFFFFu! = 0xFFFFFFFFFFFFFFFFui64. It turns out that the condition (n! = String :: npos) is always true!
C ++ 11 can somehow help here?
The answer is yes and no.
In some cases, the new
auto keyword can help. And in some it can only confuse the programmer. Therefore, let's carefully consider.
If you declare "auto a = .....", then its type will be automatically calculated. It is very important not to get confused and not to write such a
wrong code : "auto n = 1024 * 1024 * 1024 * 5;".
Talk about the
auto keyword. Consider the following example:
auto x = 7;
In this case, the type of the variable 'x' will be 'int', because it is this type of initializer that has this type. In general, we can write:
auto x = expression;
And the type of the variable 'x' will be equal to the type of the value obtained by evaluating the expression.
The keyword 'auto' to infer the type of a variable from its initializer is most useful when the exact type of the expression is not known or difficult to write. Consider an example:
template<class T> void printall(const vector<T>& v) { for (auto p = v.begin(); p!=v.end(); ++p) cout << *p << "\n"; }
In C ++ 98, you would have to write a much longer code:
template<class T> void printall(const vector<T>& v) { for (typename vector<T>::const_iterator p = v.begin(); p!=v.end(); ++p) cout << *p << "\n"; }
Very useful innovation in C ++ 11.
Let's return to our problem. The expression "1024 * 1024 * 1024 * 5" has type 'int'. So for now, 'auto' will not help us.
Do not help us 'auto' and in the case of a cycle:
size_t Count = BigValue; for (auto Index = 0; Index < Count; ++Index)
Got better? Not. The number 0 is of type 'int'. So the Index variable will now have the type not 'unsigned', but “int '. It probably got worse.
So is there any use for us from 'auto'? Yes there is. For example, here:
string str = .....; auto n = str.find("ABC"); if (n != string::npos)
The variable 'n' will be of type string :: size_type. Now everything is fine.
So finally, the new keyword 'auto' came in handy. However, be careful. You need to understand what you are doing and why. No need to hope to overcome all the errors associated with mixed arithmetic, using everywhere 'auto'. This is just one of the tools, not a panacea.
By the way, there is another way to protect against type clipping in the example discussed earlier:
unsigned n = str.find("ABC");
You can use the new variable initialization format, which prevents narrowing of types. The problem is that C and C ++ languages ​​implicitly truncate some types:
int x = 7.3;
However, C ++ 11 initialization lists do not allow narrowing of types:
int x0 {7.3};
These examples are more interesting for us now:
size_t A = 1; unsigned X = A; unsigned Y(A); unsigned Q = { A };
Imagine that the code is written like this:
unsigned n = { str.find("ABC") }; unsigned n{str.find("ABC")};
This code will be compiled in 32-bit mode and will cease to compile in 64-bit mode.
Again, this is not a panacea for all errors. Just another way to write more reliable programs.
Address arithmetic
The problem is in many ways similar to what we considered in the section “Mixed arithmetic”. The only difference is that overflow occurs when working with pointers (
more ).
Consider an example:
float Region::GetCell(int x, int y, int z) const { return array[x + y * Width + z * Width * Height]; }
This code is taken from a real program of mathematical modeling, in which an important resource is the amount of RAM. In programs of this class, one-dimensional arrays are often used to save memory, while working with them as with three-dimensional arrays. To do this, there are functions similar to GetCell that provide access to the necessary elements. But the above code will only work correctly with arrays containing less than INT_MAX elements. The reason is the use of 32-bit int types to calculate the element index.
C ++ 11 can somehow help here? Not.
Changing the type of array and packing pointers.
Sometimes in programs it is necessary (or simply convenient) to represent the elements of an array as elements of another type (
more ). It is also convenient to store pointers in variables of integer type (
more ).
Errors occur due to incorrect explicit type conversions. With the new standard C ++ 11 there is no relationship here. Explicit type conversions were always done at your own peril and risk.
It should also be mentioned about working with data that is in associations (union). Such work with data is low-level and also depends only on the skills and knowledge of the programmer (
more ).
Serialization and data exchange
A project may need to create a compatible data format. That is, one data set should be processed with both a 32-bit and 64-bit version of the program. The difficulty lies in the fact that changing the size of some data types (
more ).
The C ++ 11 standard made life a little easier by entering fixed-size types. Previously, programmers declared such types independently or used types declared in one of the system libraries.
Now there are the following types of fixed size:
- int8_t
- int16_t
- int32_t
- int64_t
- uint8_t
- uint16_t
- uint32_t
- uint64_t
In addition to the size, the alignment of the data in memory is changed (data alignment). This can also provide certain difficulties (
more ).
Regarding this topic, it is worth mentioning the appearance in C ++ 11 of the new keyword 'alignment'. Now you can write this code:
There is also an 'alignof' operator, which returns the alignment for the specified argument (the argument must be a type). Example:
constexpr int n = alignof(int);
Overloaded functions
When porting 32-bit programs to a 64-bit platform, there may be a change in the logic of its operation associated with the use of overloaded functions. If the function is overridden for 32-bit and 64-bit values, then a call to it with an argument, for example, such as size_t, will be translated into various calls on different systems (
more ).
I find it difficult to answer whether it is possible to use some of the new properties of the language to combat such errors.
Type Size Checks
There are cases when it is necessary to check the sizes of data types. This is necessary not to get the buggy program after recompiling the code for the new system.
This is often done in the wrong way. For example:
assert(sizeof(unsigned) < sizeof(size_t)); assert(sizeof(short) == 2);
Bad way. First, the program still compiles. Secondly, these checks will only manifest themselves in the debug version.
It is much better to stop compiling if the necessary conditions are not met. There are many solutions for this. For example, you can use the _STATIC_ASSERT macro, which is available to developers using Visual Studio. Usage example:
_STATIC_ASSERT(sizeof(int) == sizeof(long));
C ++ 11 standardized the way how to stop the compilation if something went wrong. Introduced compile-time statements (static assertions).
Static statements (compile-time statements) contain a constant expression and a string literal:
static_assert(expression, string);
The compiler evaluates the expression, and if the result of the calculation is false (ie, the statement is violated), displays the string as an error message. Examples:
static_assert(sizeof(long)>=8, "64-bit code generation required for this library."); struct S { X m1; Y m2; }; static_assert(sizeof(S)==sizeof(X)+sizeof(Y), "unexpected padding in S");
Conclusion
Writing code with the maximum use of new constructions of the C ++ 11 language does not guarantee the absence of 64-bit errors. However, the language presents several new features that will make the code shorter and more reliable.
Additional resources
The article did not attempt to acquaint the reader with as many innovations as possible in the C ++ 11 language. For the first acquaintance with the new standards, we can recommend the following resources:
- Bjarne Stroustrup. C ++ 11 - the new ISO C ++ standard ( Remarkable translation ).
- Wikipedia. C ++ 11 .
- Scott Meyers. An Effective C ++ 11/14 Sampler .