Floating point calculations: can you trust the results?

Those who deal with applied calculations know what troubles the finite accuracy of the representation of real numbers in a computer can bring. The most well-known problems in this regard are the solution of perturbation-sensitive (so-called, ill-conditioned) systems of linear equations and the determination of the eigenvalues of asymmetric matrices.

When it comes to everyday arithmetic operations, problems with finite precision calculations do not look so frightening. And the best test that the result is obtained correctly is a comparison of the values obtained at different precisions.

If, for example, the calculations obtained on single and double precision coincide, then a feeling of confidence is created as a result, at least with an accuracy comparable to single. Here, I would like to give one interesting example, which demonstrates that even in a relatively simple arithmetic problem, such stability with variable accuracy in the representation of numbers cannot serve as a basis for such certainty .

It is necessary to calculate the value of the function of two variables at certain values (given below) of its arguments:
')
An example of a computationally unstable function

An example of a computationally unstable function

This example was noticed by me when I dealt with the C-XSC library (the C class system of classes for (mainly) interval scientific calculations with arbitrary precision). This library is well suited for the practical study of the computational stability of various numerical algorithms.

To emulate floating-point calculations in Python, install the mpmath package. In principle, if we confine ourselves to the IEEE 754 accuracy, we could use NumPy, but we need to show that the results obtained in the framework of IEEE 754 are incorrect.

Calculate the value of the function f ( a , b ) with a = 77617 and b = 33096.

# coding: utf-8 from mpmath import * def f(a,b): ''' From: Ramp SM Algorithms for verified inclusions -- theory and practice, USA, NY, 1988. ''' return (333.75 - a * a) * b ** 6 + a * a * (11 * a * a * b * b - 121 * b ** 4 - 2) + 5.5 * b ** 8 + a/(2.0*b) #   mp.dps = 8 a = mpf(77617) b = mpf(33096) print ' f(a, b)   :', f(a, b) #   mp.dps = 16 a = mpf(77617) b = mpf(33096) print ' f(a, b)   :', f(a, b)

The reader may notice that setting accuracy through dps in mpmath is not quite the right approach when emulating real double precision. If we are talking about doubled and \ or single precision within the IEEE framework, then, perhaps, it is more correct to describe their characteristics through a binary system. Here, however, it is not so important; but using mp.dps is easier to interpret - i.e. as the number of decimal digits in the number representation.

Performing the code, we get:

  f(a, b)   : 1.1726039  f(a, b)   : 1.172603940053179

Completely convincing, right? Only the value is wrong! The correct value for given a and b is generally less than one!

Let's complete the example with the following calculations:

 for i in range(8, 40): mp.dps = i a = mpf(77617) b = mpf(33096) print ' dps={0}, f(a,b)={1}'.format(mp.dps, f(a,b))

We get (some lines are omitted):

  dps=8, f(a,b)=1.1726039  dps=9, f(a,b)=1.17260394  dps=10, f(a,b)=-7.737125246e+25 ...  dps=13, f(a,b)=1.172603940053  dps=14, f(a,b)=1.1726039400532  dps=15, f(a,b)=1.17260394005318  dps=16, f(a,b)=1.172603940053179  dps=17, f(a,b)=-9.2233720368547758e+18 ...  dps=28, f(a,b)=1.172603940053178631858834905  dps=29, f(a,b)=1.1726039400531786318588349045  dps=30, f(a,b)=1.17260394005317863185883490452 ...  dps=36, f(a,b)=-0.827396059946821368141165095479816292  dps=37, f(a,b)=-0.827396059946821368141165095479816292  dps=38, f(a,b)=-0.827396059946821368141165095479816292  dps=39, f(a,b)=-0.827396059946821368141165095479816291999

It can be seen that despite the stability, some values still significantly differ from the usual ones, which already makes you wonder.
I have to say right away that the values obtained with accuracy dps = 36 and higher are correct. But how to know that with further increase in accuracy there will not be any more jump, because, as was seen with the value 1.17260 ..., even the stability of the result with different precisions cannot guarantee its correctness.

Fortunately, this question can also be answered using the mpmath package. This package allows you to perform interval calculations, which makes it possible to estimate the range of possible values of a function when representing its arguments with different accuracy.

Perform calculations using the mpmath interval arithmetic unit :

 for j in range(6, 40): iv.dps = j a = iv.mpf(77617) b = iv.mpf(33096) print '={0}: f(a, b)={1}'.format(mp.dps, f(a,b))

We get the following:

  dps=6: f(a, b)=[-5.0706024009e+30, 6.3382542101e+30]  dps=7: f(a, b)=[-3.16912650057e+29, 3.16912654779e+29]  dps=8: f(a, b)=[-5.94211218857e+28, 2.971056097974e+28]  dps=9: f(a, b)=[-3.7138201178561e+27, 4.9517601582944e+27]  dps=10: f(a, b)=[-1.54742504910673e+26, 3.09485009825849e+26]  dps=11: f(a, b)=[-1.934281311383407e+25, 3.86856262277385e+25]  dps=12: f(a, b)=[-4.8357032784585167e+24, 3.6267774588444373e+24]  dps=13: f(a, b)=[-3.02231454903657294e+23, 2.26673591177745118e+23]  dps=14: f(a, b)=[-2.833419889721787128e+22, 2.833419889721790484e+22]  dps=15: f(a, b)=[-3.5417748621522339103e+21, 3.5417748621522344346e+21]  dps=16: f(a, b)=[-442721857769029238784.0, 442721857769029246976.0]  dps=17: f(a, b)=[-27670116110564327424.0, 27670116110564327456.0]  dps=18: f(a, b)=[-3458764513820540927.0, 2305843009213693953.5]  dps=19: f(a, b)=[-432345564227567614.828125, 288230376151711745.179688]  dps=20: f(a, b)=[-36028797018963966.8274231, 18014398509481985.17260742]  dps=21: f(a, b)=[-3377699720527870.8273963928, 1125899906842625.172604084]  dps=22: f(a, b)=[-140737488355326.827396061271, 422212465065985.172603942454]  dps=23: f(a, b)=[-17592186044414.8273960599472, 17592186044417.17260394006735]  dps=24: f(a, b)=[-1099511627774.8273960599468637, 3298534883329.17260394005325]  dps=25: f(a, b)=[-137438953470.827396059946822859, 412316860417.172603940053178917]  dps=26: f(a, b)=[-17179869182.82739605994682137446, 17179869185.17260394005317863941]  dps=27: f(a, b)=[-2147483646.8273960599468213681761, 2147483649.1726039400531786320407]  dps=28: f(a, b)=[-268435454.827396059946821368142245, 268435457.172603940053178631864532]  dps=29: f(a, b)=[-8388606.827396059946821368141165871, 8388609.172603940053178631858840746]  dps=30: f(a, b)=[-1048574.8273960599468213681411651475, 1048577.1726039400531786318588349559]  dps=31: f(a, b)=[-131070.827396059946821368141165095792, 131073.172603940053178631858834907439]  dps=32: f(a, b)=[-8190.827396059946821368141165095483, 1.172603940053178631858834904520184761]  dps=33: f(a, b)=[-1022.8273960599468213681411650954798445, 1.1726039400531786318588349045201837979]  dps=34: f(a, b)=[-126.82739605994682136814116509547981678, 1.17260394005317863185883490452018372567]  dps=35: f(a, b)=[-6.827396059946821368141165095479816292382, 1.172603940053178631858834904520183709123]  dps=36: f(a, b)=[-0.82739605994682136814116509547981629200549, -0.82739605994682136814116509547981629181741]  dps=37: f(a, b)=[-0.827396059946821368141165095479816292005489, -0.827396059946821368141165095479816291981979]  dps=38: f(a, b)=[-0.8273960599468213681411650954798162919996113, -0.827396059946821368141165095479816291998142]  dps=39: f(a, b)=[-0.82739605994682136814116509547981629199906033, -0.82739605994682136814116509547981629199887666]

It is interesting to follow the evolution of the interval of possible values of the function: it stabilizes only when using accuracy of 36 or more significant decimal digits, although it is gradually narrowing.
From interval calculations it becomes quite clear that you should trust only the results obtained with precisions from dps = 36 or more decimal digits.

This example is a clear demonstration of how careful it is to perform floating-point calculations, and that even 128-bit (for example, numpy.float128, if we talk about Python and NumPy ), the accuracy may be insufficient. It also shows that it is impossible to trust and sustainability of the result obtained at different accuracy. The use of the interval computational apparatus can be one of the solutions in this case, which allows us to estimate the necessary accuracy for obtaining an adequate result.

Source: https://habr.com/ru/post/305276/

All Articles

Floating point calculations: can you trust the results?

More articles: