According to the UUID standard RFC 4122, UUIDs should be treated as unsigned int 128 and comparison should be done that way:
Rules for Lexical Equivalence:
Consider each field of the UUID to be an unsigned integer as shown
in the table in section Section 4.1.2. Then, to compare a pair of
UUIDs, arithmetically compare the corresponding fields from each
UUID in order of significance and according to their data type.
Two UUIDs are equal if and only if all the corresponding fields
are equal.
UUIDs, as defined in this document, can also be ordered
lexicographically. For a pair of UUIDs, the first one follows the
second if the most significant field in which the UUIDs differ is
greater for the first UUID. The second precedes the first if the
most significant field in which the UUIDs differ is greater for
the second UUID.
Meaning that println(UUID.fromString("b533260f-6479-4014-a007-818481bd98c6") < UUID.fromString("131f0ada-6b6a-4e75-a6a0-4149958664e3"))
should print false.
However, it prints true!!
Looking at the implementation of compareTo (I'm using Temurin here),
@Override
public int compareTo(UUID val) {
// The ordering is intentionally set up so that the UUIDs
// can simply be numerically compared as two numbers
int mostSigBits = Longpare(this.mostSigBits, val.mostSigBits);
return mostSigBits != 0 ? mostSigBits : Longpare(this.leastSigBits, val.leastSigBits);
}
For this particular case, the most significant bits are
-5389922481480318956
1377831944219938421
respectively. Meaning that comparison is wrong because the long overflows.
According to the UUID standard RFC 4122, UUIDs should be treated as unsigned int 128 and comparison should be done that way:
Rules for Lexical Equivalence:
Consider each field of the UUID to be an unsigned integer as shown
in the table in section Section 4.1.2. Then, to compare a pair of
UUIDs, arithmetically compare the corresponding fields from each
UUID in order of significance and according to their data type.
Two UUIDs are equal if and only if all the corresponding fields
are equal.
UUIDs, as defined in this document, can also be ordered
lexicographically. For a pair of UUIDs, the first one follows the
second if the most significant field in which the UUIDs differ is
greater for the first UUID. The second precedes the first if the
most significant field in which the UUIDs differ is greater for
the second UUID.
Meaning that println(UUID.fromString("b533260f-6479-4014-a007-818481bd98c6") < UUID.fromString("131f0ada-6b6a-4e75-a6a0-4149958664e3"))
should print false.
However, it prints true!!
Looking at the implementation of compareTo (I'm using Temurin here),
@Override
public int compareTo(UUID val) {
// The ordering is intentionally set up so that the UUIDs
// can simply be numerically compared as two numbers
int mostSigBits = Longpare(this.mostSigBits, val.mostSigBits);
return mostSigBits != 0 ? mostSigBits : Longpare(this.leastSigBits, val.leastSigBits);
}
For this particular case, the most significant bits are
-5389922481480318956
1377831944219938421
respectively. Meaning that comparison is wrong because the long overflows.
Is UUID comparison in Java against the standard?
Yes. It is reported as the bug JDK-7025832 - "java.lang.UUID compareTo()
does not do an unsigned compare".
It has been marked as "won't fix" because it is very important that the behaviour of compareTo
is consistent across Java versions.
It's simple to write a Comparator
that compares them in an unsigned way if you specifically want that behaviour. Just use LongpareUnsigned
.
Comparator<UUID> rfcComparator = (a, b) -> {
int mostSigBits = LongpareUnsigned(a.getMostSignificantBits(), b.getMostSignificantBits());
return mostSigBits != 0 ? mostSigBits : LongpareUnsigned(a.getLeastSignificantBits(), b.getLeastSignificantBits());
};
I think Java can be excused for not implementing comparison according to that document; the stated rule has a fatal flaw.
In particular "the first one follows the second" and "the second precedes the first" both mean the same thing¹; according to the rule the only possible outcome is A > B
(and the rule implicitly allows A == B
) but never is A < B
.
¹ The direct mathematical interpretations are A > B
and B < A
, respectively.
println
statement above is not valid Java, it's valid Kotlin/JVM. – k314159 Commented Mar 6 at 16:09