📜 ⬆️ ⬇️

Write comparators correctly

In Java, to introduce order among certain objects, you can write a comparator — a class containing the compare function that compares two objects. An alternative to the comparator is the natural order of the objects: the object implements the Comparable interface, which contains a compareTo method that allows you to compare this object with another. The comparison function should return 0 if the objects are equal, a negative number (usually -1) if the first object is less than the second, and a positive number (usually 1) if the first is larger. Typically, the implementation of such a function is not difficult, but there is one case that many people forget.

Comparison is used by various algorithms from sorting and binary searching to maintaining order in sorted collections like TreeMap . These algorithms are tied to three important properties of the comparison function: reflexivity (comparing an element with itself always gives 0), antisymmetry (comparing A with B and B with A should give a different sign) and transitivity (if comparing A with B and B with C gives the same sign, then the comparison of A with C should give the same). If the comparing function does not satisfy these properties, the algorithm may produce a completely unpredictable result. And most likely you will not get any exception, just the result will be incorrect.

As it turned out, non-compliance with these properties is not such a rare situation. The problem occurs when comparing real numbers - float or double.

Suppose we have a class with a field of type double, and we want to order objects of this class depending on the value of the field. Often you can find this code:
 public class DoubleHolder implements Comparable<DoubleHolder> { double d; public DoubleHolder(double d) { this.d = d; } @Override public int compareTo(DoubleHolder o) { return d > od ? 1 : d == od ? 0 : -1; } @Override public String toString() { return String.valueOf(d); } } 

The compareTo method above has two problems. The first is insignificant - it does not distinguish between +0.0 and -0.0: new DoubleHolder(-0.0).compareTo(new DoubleHolder(+0.0)) returns 0. Sometimes it is not a bad thing, but in the case of sorting, elements with +0.0 and -0.0 will be located in an arbitrary order that will look ugly. However, all this is minor compared to NaN. The NaN number (in both double and float types) is a rather specific thing. It is not more, not less and not equal to any other number. As a result, we immediately get a violation of the properties of the comparison:
')
 DoubleHolder nan = new DoubleHolder(Double.NaN); DoubleHolder zero = new DoubleHolder(0.0); System.out.println("nan.compareTo(nan): "+nan.compareTo(nan)); System.out.println("nan.compareTo(zero): "+nan.compareTo(zero)); System.out.println("zero.compareTo(nan): "+zero.compareTo(nan)); 

 nan.compareTo(nan): -1 nan.compareTo(zero): -1 zero.compareTo(nan): -1 

Neither reflexivity nor antisymmetry is observed. You can also find such an implementation of comparison:
 public int compareTo(DoubleHolder o) { return d > od ? 1 : d < od ? -1 : 0; } 

Here, all three previous comparisons will yield zero, that is, as if the properties are observed. But, of course, it's too early to rejoice:

 DoubleHolder nan = new DoubleHolder(Double.NaN); DoubleHolder zero = new DoubleHolder(0.0); DoubleHolder one = new DoubleHolder(1.0); System.out.println("zero.compareTo(nan): "+zero.compareTo(nan)); System.out.println("nan.compareTo(one): "+nan.compareTo(one)); System.out.println("zero.compareTo(one): "+zero.compareTo(one)); 

 zero.compareTo(nan): 0 nan.compareTo(one): 0 zero.compareTo(one): -1 

Here transitivity is violated: the first object is equal to the second, the second is equal to the third, but the first to the third is not equal.

What does it threaten a simple man in the street? To understand this, create a simple list and try to mix and sort it several times:
 List<DoubleHolder> data = new ArrayList<>(); for(int i=1; i<=10; i++) { data.add(new DoubleHolder(i)); } data.add(new DoubleHolder(Double.NaN)); for(int i=0; i<10; i++) { Collections.shuffle(data); Collections.sort(data); System.out.println(data); } 

The output in each run is different and may look, for example, like this:
 [NaN, 1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0, 10.0] [NaN, 1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0, 10.0] [1.0, NaN, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0, 10.0] [NaN, 1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0, 10.0] [1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0, 10.0, NaN] [2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, NaN, 1.0, 9.0, 10.0] [1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 10.0, NaN, 9.0] [NaN, 1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0, 10.0] [NaN, 1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0, 10.0] [1.0, NaN, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0, 10.0] 

Or for the second implementation of compareTo :
 [1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 8.0, 9.0, NaN, 7.0, 10.0] [1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0, 10.0, NaN] [1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0, 10.0, NaN] [1.0, 2.0, 9.0, 10.0, NaN, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0] [1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0, 10.0, NaN] [1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0, 10.0, NaN] [2.0, 6.0, NaN, 1.0, 3.0, 4.0, 5.0, 7.0, 8.0, 9.0, 10.0] [2.0, 4.0, NaN, 1.0, 3.0, 5.0, 6.0, 7.0, 8.0, 9.0, 10.0] [1.0, 3.0, NaN, 2.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0, 10.0] [1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0, 10.0, NaN] 

In about half of the cases, an element with NaN is nailed to the beginning or end of the list being sorted, and in other cases it begins to wander, spoiling the order of the surrounding elements.

With collections using DoubleHolder as a key, nothing good will happen either. Take, for example, the TreeSet . With the second implementation of compareTo everything is quite simple: since an element containing NaN is equal to any other, it will not be possible to insert it into a non-empty set, because it decides that such an element already exists. But beware if you insert the NaN element first: after that, it will not be possible to add anything else to the set.

The first version of the comparison function is psychedelic. For example, let's write this test:
 Set<DoubleHolder> set = new TreeSet<>(); for(int i=0; i<50; i++) { set.add(new DoubleHolder( i%10==0 ? Double.NaN : i%10 )); } System.out.println(set); 

We inserted five elements containing NaN, and five elements containing each digit from 1 to 9. As a result, we have the following:
 [NaN, NaN, 1.0, 2.0, 3.0, NaN, NaN, 1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0, NaN] 

It is quite expected to see NaN five times: they are not equal to each other. But due to incorrect comparisons and some other elements were inserted several times. You can see for yourself what happens to the TreeMap. Note that one accidentally hit NaN can spoil the entire collection, and it is not always easy to debug it: the collection can exist for a long time in an incorrect state and pretend that everything is normal.

Curiously, this problem should not exist at all. Back in JDK 1.4, special static methods Float.compare and Double.compare appeared that will do everything for you, correctly processing special cases. It is only necessary to write:
 public int compareTo(DoubleHolder o) { return Double.compare(d, od); } 

Apparently, the deceptive simplicity of the comparison algorithm has an effect: sometimes a programmer believes that it is easier to write oneself than to search the documentation for the standard way to do it.

Examples of incorrect double / float comparisons in various open source projects: JTS , Batik , Hadoop , Hudson , ICU4J , Lucene . It is difficult to determine in which cases this may lead to problems, but this is the case when I would definitely correct: the correct and reliable option is usually also shorter than the wrong one.

To change the situation, I wrote a small detector for FindBugs, which finds incorrectly implemented comparison functions and suggests using Float.compare/Double.compare.

In general, for all primitive types there are similar methods. If you need to compare several fields in turn, you can write this:
 public class MyObject implements Comparable<MyObject> { double d; int i; String s; char c; @Override public int compareTo(MyObject o) { int result; result = Double.compare(d, od); if(result != 0) return result; result = Integer.compare(i, oi); if(result != 0) return result; result = s.compareTo(os); if(result != 0) return result; result = Character.compare(c, oc); return result; } } 

It looks better than a bunch of branches with more and less. And if you use Guava or something similar, then this:
 @Override public int compareTo(MyObject o) { return ComparisonChain.start() .compare(d, od) .compare(i, oi) .compare(s, os) .compare(c, oc) .result(); } 

Source: https://habr.com/ru/post/247015/


All Articles