Unsigned arithmetic in Java

As you know, there are no unsigned types in Java. If in C you could write unsigned int ( char , long ), then in Java it would not work. However, it is often necessary to perform arithmetic operations with numbers without a sign. At first glance, it seems that unsigned types are not really necessary in principle (think, MaxInt for numbers with a sign less than twice, if you need numbers more, I just take a long and then BigInteger ). But the main difference is not really in how many different non-negative numbers can be put in a signed or unsigned int, but in how arithmetic operations and comparisons are performed on them. If you are working with binary protocols or with binary arithmetic, where every bit used is important, you need to be able to perform all the basic operations in unsigned mode. Consider these operations in order:

Convert byte to short (int, long)

Regular caste (int) myByte will perform an extension to 32 bits with a sign - this means that if the high bit of a byte was set to 1, then the result will be the same negative number, but written in 32-bit format:

0xff -> 0xffffffff (-1)

Often this is not what we would like. In order to expand to 32 bits unsigned and get 0x000000ff , in Java you can write:

')

 int myInt = myByte & 0xff; short myShort = myByte & 0xff;

Comparison without sign

For unsigned comparison, there is a concise formula:

 int compareUnsigned(int a, int b) { return Integer.compare( a ^ 0x80000000, b ^ 0x80000000 ); }

For byte, short and long, respectively, the constants will be 0x80 , 0x8000 and 0x8000000000000000L .

Addition, Subtraction and Multiplication

And here is a pleasant surprise - these operations are performed correctly in any case. But in expressions it is necessary to carefully ensure that operations are performed with numbers of the same type, since any implicit conversions are performed with an extension of the sign, and can lead to results that are different from those expected. The insidiousness of such bugs is that an erroneous scenario can be executed very rarely.

Division

Dividing -256 by 256 will give us -1. And we would like for 0xffffff00 / 0x100 give 0x00ffffff , not 0xffffffff (-1) . For byte , short and int solution is to switch to numbers of higher bit depth:

 int a = 0xffffff00; int b = 0x100; int c = (int) ((a & 0xffffffffL) / b); // convert a to long before division

But what to do with long ? Switching to BigInteger in such cases is usually not an option - too slow. It remains only to take everything into their own hands and implement the division manually. Fortunately, everything has been stolen before us - in Google Guava there is an implementation of unsigned division for long , and quite smart. If you do not use this library, the easiest way is to tear out a piece of code directly from the UnsignedLongs.java file:

  /** * Returns dividend / divisor, where the dividend and divisor are treated as unsigned 64-bit * quantities. * * @param dividend the dividend (numerator) * @param divisor the divisor (denominator) * @throws ArithmeticException if divisor is 0 */ public static long divide(long dividend, long divisor) { if (divisor < 0) { // ie, divisor >= 2^63: if (compare(dividend, divisor) < 0) { return 0; // dividend < divisor } else { return 1; // dividend >= divisor } } // Optimization - use signed division if dividend < 2^63 if (dividend >= 0) { return dividend / divisor; } /* * Otherwise, approximate the quotient, check, and correct if necessary. Our approximation is * guaranteed to be either exact or one less than the correct value. This follows from fact * that floor(floor(x)/i) == floor(x/i) for any real x and integer i != 0. The proof is not * quite trivial. */ long quotient = ((dividend >>> 1) / divisor) << 1; long rem = dividend - quotient * divisor; return quotient + (compare(rem, divisor) >= 0 ? 1 : 0); }

In order for the code to compile, you will also have to borrow the implementation of compare(long, long) :

  /** * Compares the two specified {@code long} values, treating them as unsigned values between * {@code 0} and {@code 2^64 - 1} inclusive. * * @param a the first unsigned {@code long} to compare * @param b the second unsigned {@code long} to compare * @return a negative value if {@code a} is less than {@code b}; a positive value if {@code a} is * greater than {@code b}; or zero if they are equal */ public static int compare(long a, long b) { return Longs.compare(flip(a), flip(b)); }

and Longs.compare(long, long) + flip(long) :

  /** * A (self-inverse) bijection which converts the ordering on unsigned longs to the ordering on * longs, that is, {@code a <= b} as unsigned longs if and only if {@code flip(a) <= flip(b)} * as signed longs. */ private static long flip(long a) { return a ^ Long.MIN_VALUE; } /** * Compares the two specified {@code long} values. The sign of the value * returned is the same as that of {@code ((Long) a).compareTo(b)}. * * @param a the first {@code long} to compare * @param b the second {@code long} to compare * @return a negative value if {@code a} is less than {@code b}; a positive * value if {@code a} is greater than {@code b}; or zero if they are equal */ public static int compare(long a, long b) { return (a < b) ? -1 : ((a > b) ? 1 : 0); }

Bitwise shifts

To finally cover the topic of bit operations, we also recall the shifts. In the x86 assembler, there is a whole bunch of various commands that make bitwise shifts - SHL, SHR, SAL, SAR, ROR, ROL, RCR, RCL. The last 4 perform cyclic shifts, their equivalents in Java are not. But the logical and arithmetic shifts are present. The logical shift (does not take into account the signs) - SHL (shift left) and SHR (shift right) - is implemented in Java by the operators << and >>> respectively. With the help of logical shifts, you can quickly perform integer multiplication and division by numbers of powers of two. The arithmetic shift (takes into account the sign) to the right - SAR - is implemented by the >> operator. An arithmetic left shift is equivalent to a logical one, and therefore there is no special operator for it. It may seem strange that the assembler has a special opcode for this operation, but in fact it does the same, that is, SAL completely repeats the behavior of SHL, and this is clearly indicated by the documentation from Intel:

The shift arithmetic left (SAL) and shift logical left (SHL) instructions perform the same operation; they shift the bits in the destination operand to the left (toward more significant bit locations). The bit is cleared (see Figure 7-7 in the Intel® 64 and IA-32 Architects Software Developer’s Manual, Volume 1). ).

That is, SAL was added just for symmetry, given that there is a division into logical and arithmetic to shift to the right. Well, Gosling decided not to bother (and, I think, did the right thing).

So, we have the following:

 a << 1; //   ,    2 a >> 1; //      (   2) a >>> 1; //      (    2)

Final Recommendations

When performing arithmetic operations that may lead to overflow in the selected bit grid, it is always necessary to accurately understand what range of acceptable values the variables may have, and to track these invariants by placing assertions. For example, it is obvious that when multiplying two arbitrary 32-bit unsigned numbers, the result may not fit in 32 bits, and if you need to avoid overflow, you must either make sure that there will never be a situation in which there is no fit in 32 bits or it is necessary to convert both operands into long (by performing a & 0xffffffffL ). Here, by the way, you can easily make a mistake by converting only one of the operands. No, you need to convert both to long, because if the second operand is negative, it will be implicitly converted to long with an extension of the sign, and the result of the multiplication will be incorrect.
Generously place brackets in expressions that use bitwise operations. The fact is that the priority of bitwise operators in Java is somewhat strange, and often behaves in an unobvious way. It is better to add a pair of parentheses than a few hours later to look for subtle mistakes.
If you need a constant of type long , do not forget to add the suffix L to the end of the literal constant. If we do not do this, it will be not long , but an int , and with an implicit coercion to long , an unpleasant signed extension will occur again.

Source: https://habr.com/ru/post/225901/

All Articles