An approximate comparison of numbers in Haskell

Probably everyone knows that when calculating with limited accuracy, two mathematically equivalent expressions may not be equal to each other. For example, the following obvious mathematical equality is unexpectedly false when calculated in Haskell:

ghci> 3 * sqrt(24 ^ 2 + 16 ^ 2) == sqrt(72 ^ 2 + 48 ^ 2) False

The reason for such a violation is that the expressions in this equality are calculated only approximately:
')

 ghci> 3 * sqrt(24 ^ 2 + 16 ^ 2) 86.53323061113574 ghci> sqrt(72 ^ 2 + 48 ^ 2) 86.53323061113575 ghci> sqrt(72 ^ 2 + 48 ^ 2) - 3 * sqrt(24 ^ 2 + 16 ^ 2) 1.4210854715202004e-14

The difference here is only in the last (fourteenth!) Decimal place, but this is enough for the comparison to be false.

Although this problem is well known, programmers pay little attention to it. First, it is believed that comparisons of this kind occur only in a narrow field of numerical methods, and second, that violation of equality occurs extremely rarely. As it turned out, both are not entirely true. The given case arose when I needed to implement the function of calculating the length of a vector with integer coordinates. At the same time, for the unit testing, the QuickCheck package tools are used, which rather quickly found a case of violation of the scaling invariant for the vector length. I note that this is not the only invariant whose violation was discovered during testing.

The question arises: what is the easiest way to describe the verification of the approximate equality of two numbers obtained as a result of calculations with limited accuracy? To solve this problem in Haskell, it suffices to define another comparison operator (say, ~ =), which is used in the same way as the ordinary equality operator. I propose to consider the implementation of such an operator, which can be arranged in the form of a fairly simple Circa module.

The first thing that comes to mind for an approximate comparison of two numbers is to calculate the absolute value of the difference between the numbers being compared, and to check whether it exceeds some threshold:

 ghci> abs(sqrt(72 ^ 2 + 48 ^ 2) - 3 * sqrt(24 ^ 2 + 16 ^ 2)) < 1e-12 True

Such a solution, of course, is quite workable. But he has two serious drawbacks. First of all, when reading this record, it is not at all obvious that we check the equality (even if it is approximate) of two numbers. In addition, we had to use the “magic” number 1e-12 to indicate the expected inaccuracy of the comparison. Of course, with a small number of such comparisons with these inconveniences could be put up. But when the number of invariants is measured in dozens, I would like to get a simpler and more obvious way to write them. Much better is the code in which a two-place approximate comparison operator is used in the same way as a regular equal sign, for example, like this:

 sqrt(72 ^ 2 + 48 ^ 2) ~= 3 * sqrt(24 ^ 2 + 16 ^ 2)

Unfortunately, standard Haskell syntax does not provide such an operator. However, the language itself provides a wonderful opportunity to introduce a new operator on your own, as if it were part of the standard syntax.

First, we define the function circaEq, which will reduce the approximate comparison record

 circaEq :: (Fractional t, Ord t) => t -> t -> t -> Bool circaEq txy = abs (x - y) < t

Now our example is getting a bit shorter, but not much clearer:

 ghci> circaEq 1e-12 (sqrt(72 ^ 2 + 48 ^ 2)) (3 * sqrt(24 ^ 2 + 16 ^ 2)) True

There is still a lot of unnecessary information: to separate the arguments, we had to enclose them in brackets, and most importantly, it is still necessary to specify the accuracy of the comparison. To get rid of the extra parameter, let's use the trick that was used in the Data.Fixed module to represent numbers with fixed precision. We define a series of two-place functions, each of which will compare numbers with a predetermined error. It turns out that in practice it is enough to define only seven such functions for the most typical values of the error:

 picoEq :: (Fractional t, Ord t) => t -> t -> Bool picoEq = circaEq 1e-12 infix 4 `picoEq` nanoEq :: (Fractional t, Ord t) => t -> t -> Bool nanoEq = circaEq 1e-9 infix 4 `nanoEq` microEq :: (Fractional t, Ord t) => t -> t -> Bool microEq = circaEq 1e-6 infix 4 `microEq` milliEq :: (Fractional t, Ord t) => t -> t -> Bool milliEq = circaEq 1e-3 infix 4 `milliEq` centiEq :: (Fractional t, Ord t) => t -> t -> Bool centiEq = circaEq 1e-2 infix 4 `centiEq` deciEq :: (Fractional t, Ord t) => t -> t -> Bool deciEq = circaEq 1e-1 infix 4 `deciEq` uniEq :: (Fractional t, Ord t) => t -> t -> Bool uniEq = circaEq 1 infix 4 `uniEq`

Any of these functions can be used as a two-place comparison operator, for example:

 ghci> sqrt(72 ^ 2 + 48 ^ 2) `picoEq` 3 * sqrt(24 ^ 2 + 16 ^ 2) True

It only remains to add a little sugar:

 (~=) :: (Fractional t, Ord t) => t -> t -> Bool (~=) = picoEq infix 4 ~=

and we get what we wanted:

 ghci> sqrt(72 ^ 2 + 48 ^ 2) ~= 3 * sqrt(24 ^ 2 + 16 ^ 2) True

Let us answer one more important question: why did we need several different functions for an approximate comparison? Is comparison with pico-accuracy insufficient? It turns out really not enough. A suitable counterexample is found using the same QuickCheck package:

 ghci> sqrt(5588 ^ 2 + 8184 ^ 2) ~= 44 * sqrt(127 ^ 2 + 186 ^ 2) False ghci> sqrt(5588 ^ 2 + 8184 ^ 2) `nanoEq` 44 * sqrt(127 ^ 2 + 186 ^ 2) True

It is obvious that the required level of accuracy strongly depends on the scale of the numbers with which it is necessary to work. That is why Circa module exports not only an approximate comparison operator, but also a set of functions for which it can become a synonym. If pico-precision is unacceptable for an application, it can import the necessary function and define an approximate comparison operator accordingly. You can do the same if someone likes a different designation for an approximate comparison operator.

Source: https://habr.com/ru/post/248043/

All Articles

An approximate comparison of numbers in Haskell

More articles: