Fixed point arithmetic in C ++

Today I will tell you what fixed-point is, why it is needed and how it can be used.

There is such a problem when the performance of the application may significantly deteriorate due to the peculiarities of the calculation on floating-point numbers. As a rule, the CPU is sharpened for integer operations, and the FPU (floating point unit) coprocessor in it works slower in order. There are such platforms where there is no FPU at all and the emulation of operations with numbers would take a lot of time. For example, in the presence of FPU, multiplication of floating-point numbers is performed with just one fmul command, and in the absence of FPU, multiplication is performed with the emulation function __mulsf3. Compared to the fmul command, the __mulsf3 function emulates operations on floating-point numbers, and the calculations are performed in integer form, which leads to an increase in the machine code and time to execute it, while the fmul command performs this operation quickly, using hardware means.

This problem has a solution that allows you to perform calculations with a fixed point on an integer type.

The principle of this type consists in a fixed shift of a number by N bits, as a result of which a fractional number can be represented as an integer and it will have an accuracy of 2 ^ N after the point. An example of converting a floating-point number to a fixed-point number is about 8 bits (2 ^ 8 = 1024).
')
Here is an example of converting a floating-point number to a fixed-point number:

Fixed(12345,6789) = 1024 * 12345,6789 = 12641975,<s>1936</s>

This number, after the point, has an accuracy of 2 ^ 8 after the comma.

An example of a reverse translation of a number with a fixed point to a floating point number.

 Float(12641975) = 12641975 / 1024 = 12345,678<s>7109375</s>

In this case, the number after the reverse translation has the form 12345,6787109375 and is exact 3 digits after the point, the maximum accuracy is actually 2 ^ 8 = 1024.

How do the calculations on the type with a fixed point?

The sum and difference operations are equivalent to ordinary integer operations.

Fixed(x) + Fixed(y) Fixed(x) - Fixed(y) , with any order
(1024 * x) + (1024 * y) (1024 * x) - (1024 * y)

The multiplication of such numbers is made in this form.
(Fixed(x) * Fixed(y)) / p , this is equivalent, with an order of 8 bits
((1024 * x) * (1024 * y)) / 1024

Division.
(Fixed(x) * p) / Fixed(y) , also with the order of 8 bits, this
(1024 * 1024 * x)*(1024 * y)

Overflow

When performing multiplication and division operations, overflow is possible, which will lead to an incorrect result. This will happen if, for example, a 32-bit integer type is used, and an overflow of this type occurs during the calculations, and as a result of this overflow the number will lose the high-order bits. There are two ways to eliminate the overflow:

Perform calculations in 64-bit integer type.
Perform calculations in a "parsed" form, for example, when multiplying, (xi + xf) * (yi + yf) = xi * yi + xf * yf + xi * yf + yi * xf, the prefixes i and f mean the integer part and the part after points.

Class for working with fixed-point in C ++

 #define DIGITS 1024 //  #define EPS 20 //       using namespace std; typedef signed int __int32_t; class Fixed { signed int x; Fixed(signed int a){ x = a; } public: Fixed(){ x = 0; } static Fixed fromInt(signed int val){ return Fixed(val*DIGITS); } static Fixed fromFloat(float val){ return Fixed((signed int)(val*DIGITS)); } float fixed2float(){ return ((float)x)/DIGITS; } Fixed sum(Fixed a,Fixed b){ return Fixed(a.x+bx); } Fixed diff(Fixed a,Fixed b){ return Fixed(ax-bx); } Fixed mul(Fixed a,Fixed b){ signed int c=ax*bx; if(c/bx != ax){ // Overflow! signed int i1 = ax/DIGITS; signed int i2 = bx/DIGITS; signed int f1 = (ax&(DIGITS-1)); signed int f2 = (bx&(DIGITS-1)); return Fixed((i1*i2)*DIGITS+(f1*f2)/DIGITS+i1*f2+i2*f1); }else{ return Fixed(c/DIGITS); } } Fixed div(Fixed a,Fixed b){ if(ax>(1<<21)){ // Overflow! signed int i = ax/DIGITS; signed int f = (ax&(DIGITS-1)); return Fixed(((i*DIGITS)/bx)*DIGITS+(f*DIGITS)/bx); }else{ return Fixed((ax*DIGITS)/bx); } } Fixed sqrt(Fixed k){ Fixed tmp(0); tmp.x = kx/2; signed int min = 0; signed int max = kx; Fixed quick(0); do{ tmp.x = (min+max)/2; quick = Fixed::mul(tmp,tmp); if(abs(quick.xk.x)<EPS) return Fixed(tmp); if(quick.x>kx){ max = tmp.x; }else{ min = tmp.x; } }while(true); } };

Source: https://habr.com/ru/post/451922/

All Articles

Fixed point arithmetic in C ++

How do the calculations on the type with a fixed point?

Overflow

Class for working with fixed-point in C ++

More articles: