Before performing the multiplication, C ++ leads the factors to one type not shorter than int, and the bitness of the result coincides with the bit depth of the factors mentioned. In order not to lose accuracy, it is sometimes required to perform additional operations for multiplication.
Consider the task. The system determines the time since the launch of the program in ticks of the microprocessor by calling the function:
unsigned long long getTickCount();
The length of unsigned long long is 64 bits. To convert to physical units of time in the system there is a constant:
const unsigned long long TICKS_PER_SECOND = 1999000001ULL;
It is required to define the function of converting ticks to nanoseconds getNsec (unsigned long long ticks) with semantics:
unsigned long long getNsecNaive(unsigned long long ticks) { static const unsigned long long NSEC_PER_SECOND = 1000000000ULL; unsigned long long nsec = NSEC_PER_SECOND * ticks / TICKS_PER_SECOND; return nsec; }
For the getNsec () function, it is necessary to ensure the highest possible accuracy. The ticks parameter can be both large and small. For small ticks (say, up to 2 ^ 34) you need to perform calculations in the following order:
(NSEC_PER_SECOND * ticks) / TICKS_PER_SECOND
That is, first multiply, then divide. Since NSEC_PER_SECOND <2 ^ 30, multiplication will not cause an overflow, since its result will be less than 2 ^ 64.
For large ticks, since multiplication can cause overflow, it is better to perform operations in the following order:
NSEC_PER_SECOND * (ticks / TICKS_PER_SECOND)
The problem in this case is that the second factor is always integer, the result in the decimal number system will always end with nine zeros, that is, in fact, second accuracy is provided and the getNsec () function should be renamed to getSec (). On the other hand, the initial data allow
for greater accuracy.
Taking into account the non-associative nature of machine operations of multiplication and division of everything, 4 orders of computation are possible, that is, 2 more in addition to the two considered:
(NSEC_PER_SECOND / TICKS_PER_SECOND) * ticks
and
ticks / (TICKS_PER_SECOND / NSEC_PER_SECOND)
The first one always gives zero, and the second one - ticks, that is, almost 50% error (which in this case can be reduced to 0.1%, but this error is still not the smallest possible).
So, at best, we get second precision. The problem of increasing accuracy can be solved in the following ways (in order of increasing preference):
- Make calculations with real (double) types
- Arguments to 128-bit integers
- Use the MultDiv () function
These approaches may not be applicable for reasons of limitations of the platform (processor), programming environment and libraries, project requirements.
We use the following approach. Let, as a result of dividing ticks by TICKS_PER_SECOND, we get an incomplete quotient of seconds and a remainder of r:
unsigned long long seconds = ticks / TICKS_PER_SECOND; unsigned long long r = ticks % TICKS_PER_SECOND;
Then ticks = seconds * TICKS_PER_SECOND + r. Substitute in the formula for nsec:
nsec = NSEC_PER_SECOND * (seconds * TICKS_PER_SECOND + r) / TICKS_PER_SECOND = NSEC_PER_SECOND * seconds + (NSEC_PER_SECOND * r) / TICKS_PER_SECOND. Because r <TICKS_PER_SECOND, (NSEC_PER_SECOND * r) will never overflow. Summary function:
unsigned long long getNsec(unsigned long long ticks) { static const unsigned long long NSEC_PER_SECOND = 1000000000ULL; return NSEC_PER_SECOND * (ticks / TICKS_PER_SECOND) + ((NSEC_PER_SECOND)*(ticks % TICKS_PER_SECOND))/TICKS_PER_SECOND; }
The result is exactly the same as if we counted the NSEC_PER_SECOND * ticks as a 128-bit value, and then divided it into TICKS_PER_SECOND, that is, it provides maximum accuracy for the given initial values and a given bitness of the result. The formula in the return statement is not simplified: neither NSEC_PER_SECOND nor / TICKS_PER_SECOND is bracketed.
In fact, the solution of the problem is reduced to the implementation of the MultDiv (a, b, c) function, which calculates a * b / c, where b and c are large constants, and the fraction b / c is not contractible.