Myths and legends about overflowing integers in Rust

Primitive integer types supported by processors are a limited approximation to an infinite set of integers that we used to operate in real life. This limited representation does not always coincide with "real" numbers, for example 255_u8 + 1 == 0 . Often, the programmer forgets about this difference, which can easily lead to bugs.

Rust is a programming language whose purpose is to protect against bugs, it focuses on preventing the most insidious of them - memory errors, but also tries to help the programmer avoid other problems: memory leaks , ignoring errors, and, as we will see, overflowing integers .

Rust overflow

The policy of detecting and preventing overflow in Rust has changed several times on the way to the release 1.0.0, held last year. As a result, there is a misunderstanding of how overflow is handled and what the consequences are.
')
Prior to version 1.0.0-alpha, the overflow was cyclical and the result was consistent with the use of the add-on to two (as most modern processors do). However, this solution is not optimal: an unexpected and unnoticed overflow often leads to errors. This is especially bad in C and C ++ due to the fact that overflow of signed integers is undefined behavior, which, together with inadequate protection against memory security violations, can easily lead to its damage. However, in more safety-conscious languages like Rust, this still causes problems: there are many examples of overflow, they are found in video games ( economics , health indicators , etc.), binary search implementations , and even software for aviation . Simply put, a code like max(x - y, z) occurs periodically and can give incorrect results if the numbers are unsigned and x - y causes overflow. As a result, there was a desire to make Rust more secure in terms of overflowing integers.

The current state of affairs is defined in RFC 560 :

In the debug build, arithmetic operations ( + , - , etc.) are checked for overflow and, if present, cause panic.
In the release, checks for the result of overflow are absent, and the result guarantees cyclicity.

Overflow checks can be manually turned on or off regardless of the type of assembly, both globally and at the level of individual operations.
However, you cannot influence some checks: / 0 and MIN / -1 (for signed integers) and similarly for % . These calculations are indefinite behavior in C and LLVM, which is what caused rustc behavior, although it seems to me that Rust could theoretically consider MIN / -1 as a normal overflow and return MIN with checks disabled.

Hopefully, due to checks in debug mode, overflow errors in the Rust code will be detected earlier. Moreover, if you really expect overflow, you will have to explicitly indicate this in the code, which reduces the number of false positives for both future static analyzers and for code that includes checking for overflow in all modes.

Myth: Overflow result undefined

The result of the overflow could be indefinite behavior, but one of the key objectives of Rust is to ensure the safety of working with memory, and such uncertainty (similar to indefinite behavior in C) clearly contradicts this goal. A variable containing an undefined value is not required to maintain the same value between uses:

 // -Rust let x = undefined; let y = x; let z = x; assert_eq!(y, z); //

This will lead to catastrophic consequences if security depends on such a value. For example, when checking the output of the array foo[x] :

 let x = undefined; // let y = foo[x]; //   : let y = if x < foo.len() { unsafe { *foo.get_unchecked(x) } } else { panic!("index out of bounds") };

If the value of the variable different when comparing x < foo.len() and with direct access to the array, then the guarantees may be violated: the comparison may be 0 < foo.len() , while accessing by index will foo.get_unchecked(123456789) . Disorder!

Thus, unlike signed integers in C, overflow in Rust cannot be undefined. In other words, the compiler must assume that overflow can occur, unless it can prove otherwise. This has unobvious consequences: x + 1 > x not always true, while C compilers assume that this condition is always met if is a signed integer.

"But what about performance?" I already have a presentiment of this question. Indeed, undefined behavior simplifies optimizations by allowing the compiler to make assumptions. Therefore, the rejection of such behavior can affect the speed. Uncertainty overflow of integers is especially useful in C because such numbers are often used as inductive variables in cycles, respectively, the ability to make assumptions allows a more accurate analysis of the number of loop iterations: for (int i = 0; i < n; i++) will be performed n times since it can be assumed that n does not contain a negative value. Rust avoids most of these problems using positive numbers as indices ( 0..n always gives n steps) and allowing lightweight iterators to directly bypass data structures, for example, for x in some_array { ... } . These iterators can use knowledge about the internal structure of data structures, without forcing the user to deal with undefined behavior.

Also Rust, unlike C, cannot reduce x * 2 / 2 simply to x if x is a signed integer. This optimization does not apply (unless you manually write x instead of a complex arithmetic expression), but in my practice these expressions are most often encountered when x is known at compile time, which means the whole expression will be replaced by a constant.

Myth: Overflow result is unspecified

The result of the overflow could be unspecified, in which case the compiler would have to assume that it could happen, but would have the right to return any value as a result (or not to return it at all). Indeed, in the first version of RFC 560 for checking overflow of integers, it was proposed:

Change the behavior to an unspecified return or panic call, depending on whether the overflow is being checked.
[...]
Theoretically, the implementation returns an unspecified value. In practice, however, the result will be similar to cyclic overflow. Implementations should avoid unnecessary unpredictability and unexpected behavior, so as not to provoke errors.
And the most important: this is not indefinite behavior as understood by C. Unspecified will only be the result of the operation, and not the behavior of the entire program as a whole. As in C. The programmer cannot rely on any particular value during the overflow, but the compiler has no right to use the assumption about that no overflow occurs, for optimization.

The RFC and and the "unspecified" result of the overflow (that is, 127_i8 + 1 can return -128 or 0 or 127 or any other value) has been the subject of lively discussions that caused it to change.

Thanks to the efforts of individuals, the RFC has taken on a modern look: as a result of overflow, the value will either not return at all (for example, panic will happen), or a cyclical result will be returned, corresponding to the use of the two addition. Now the wording looks like this:

Operations +, -, * can lead to overflow (overflow) or the disappearance of the order (underflow). If checks are enabled, then panic will occur. Otherwise, the result will be a cyclic overflow.

A fixed overflow result is a protective measure: errors may have no effect, even if no overflow was detected. The expression x - y + z calculated as (x - y) + z and, therefore, subtraction can lead to overflow (for example, x = 0 and y = 1 , both unsigned), but if z is large enough (in our example, z >= 1 ), the result will be similar to the use of "numbers from the real world."

The change came closer to the end of the discussion, consisting of 160 comments, so it was easy to miss, which is why people can continue to think that the result of overflow is unspecified.

Myth: A programmer cannot control overflow handling.

One of the arguments against the introduction of overflow checking was the existence of programs and algorithms that rely on cyclical overflow, such as hash calculation algorithms, some data structures (for example, a ring buffer), and even codecs. For these algorithms, using + in the debug mode will be incorrect: panic will occur, although this use of overflow has been conscious. In addition, in some cases, there may be a desire to enable checks not only for the debug build.

RFC and the standard library provide four sets of methods, in addition to the usual operators:

This should cover all "special cases":

wrapping_... return the result of the addition to two.
saturating_... returns the highest / lowest value when an overflow occurs.
overflowing_... returns the result of the complement to two, along with a boolean value signaling the overflow that occurred.
checked_... return Option , which takes the value None in case of overflow.

All these operations could be implemented in terms of overflowing_... , but the standard library tries to simplify the solution of the most frequently occurring problems.

If you really want to use cyclic overflow, then you can write something like x.wrapping_sub(y).wrapping_add(z) . This will give the expected result, and verbosity can be reduced by using the type from the standard Wrapping library.

Perhaps this is not the final state: potential improvements are also mentioned in the RFC. In the future, operators like cyclic &+ from Swift may be added to Rust. This was not done immediately, as Rust tries to be conservative and, to a reasonable degree, minimalistic, and also because of the potential to turn off overflow checks (for example, a separate function may be explicitly marked and there will be no checks in its code in all modes) . In particular, the most active (potential) users of Servo and Gecko are interested in the latter.

If you want to have overflow checks in all the code, then you have to either use checked_add everywhere (not very convenient!) Or explicitly enable them. Although they by default only work in debug mode, overflow checks can be enabled by passing -C debug-assertions=on rustc (to the Rust compiler) or by setting the debug-assertions field in the cargo profile . Work is also being carried out whenever possible to enable them independently of other debug checks (at the moment rustc supports the unstable -Z force-overflow-checks flag option).

Myth: the selected approach to overflow checks slows down the code

Rust strives to be as fast as possible, and in designing overflow checks, performance issues were taken very seriously. Performance is one of the main reasons why checks have been disabled for default release builds. Of course, this means that speed was not sacrificed for the convenience of detecting errors during development.

Unfortunately, overflow checks require more code and instructions:

 [no_mangle] pub fn unchecked(x: i32, y: i32) -> i32 { x.wrapping_add(y) } #[no_mangle] pub fn checked(x: i32, y: i32) -> i32 { x + y }

With -O -Z force-overflow-checks on x86 (on 32-bit ARM LLVM currently generates redundant comparisons and register manipulations, so the performance loss is even more!) It compiles into the following (with some changes for clarity) :

 unchecked: leal (%rdi,%rsi), %eax retq checked: pushq %rax addl %esi, %edi jo .overflow_occurred movl %edi, %eax popq %rcx retq .overflow_occurred: leaq panic_loc2994(%rip), %rdi callq _ZN9panicking5panic20h4265c0105caa1121SaME@PLT

There are more instructions than there should be, provided that checked built in (which should be the case): in this case, working with registers with pushq / pop / movl not necessary. I suppose that even without embedding, stack management via pushq / popq not required, but unfortunately, Rust is now using a version of LLVM that contains an error . Of course, all these additional instructions are annoying, as is the need to use add instead of lea .

On x86, using lea (load effective address) for arithmetic can be very useful: it allows you to perform relatively complex calculations and, as a rule, is calculated in a separate part of the CPU and its pipeline, in contrast to add , which contributes to higher parallelism at the instruction level. x86 ISA allows dereferencing of the result of complex calculations with pointers: the general form is A(r1, r2, B) (in AT & T syntax), which is equivalent to r1 + B * r2 + A where r1 and r2 are registers and A and B are constants. As a rule, this is used directly in memory instructions such as mov (for example, let y = array_of_u32[x]; can be compiled into something other than mov (array_of_u32.as_ptr(), x, 4), y , since each element has a size of 4), but lea allows you to perform arithmetic without affecting memory. In general, the ability to use lea for arithmetic is quite useful. The disadvantage is that lea does not integrate directly with overflow checks: it does not set the processor status flag to indicate this.

However, an even greater impact on performance comes from the fact that overflow checks hinder other optimizations. First, the checks themselves order the code (preventing things like the deployment, reordering, and vectoring of loops). Secondly, the panic and unwinding of the stack causes the compiler to behave more conservatively .

All of these considerations explain why overflow checks are not included in the release build, where, as a rule, the highest possible performance is important.

At the same time, even if overflow checks are included in the release mode, the performance loss can be reduced, as is the case with out-of-array checks. On the one hand, compilers can perform range analysis and prove that individual operations never lead to overflow. And indeed , this topic has received much attention . On the other hand, problems caused by the use of panic can be partially resolved by replacing panic with abnormal termination of the program , if the subject area permits.

The overflow RFC has the option of additional optimizations: “ delayed panic ” is allowed, that is, implementations can perform the sequence of operations a + b + c + d and panic once at the very end if any of the calculations led to overflow, instead of checking each separate operation tmp = a + b , then tmp + c , etc. Although at the moment it is not implemented, this possibility exists.

Myth: checks do not detect errors

All efforts to develop, discuss, and implement this integer overflow handling scheme would be in vain if they did not help to detect errors in practice. Personally, I found several problems in my code with expressions like cmp::max(x - y, z) (they did not hit the Internet, so there will be no links) almost immediately after writing, especially in conjunction with testing infrastructure, such as quickcheck .

Overflow checks found errors in our ecosystem, for example, such (the list is not complete!):

Outside of Rust, there are many other examples of the danger of overflow errors. In 2011, they hit the list of the 25 most frequent CWE / SANS errors . Some languages, such as Swift, always perform overflow checks, while others, such as Python 3 and Haskell, avoid overflowing, using arbitrary precision numbers by default. Moreover, some C compilers support options that replace undefined behavior with cyclic overflow ( -fwrapv ) and help detect overflow ( -fsanitize=signed-integer-overflow ).

Source: https://habr.com/ru/post/282958/

All Articles