Does the UUID comparison in Java violate the UUID standard? - Stack Overflow

admin2025-04-18  0

According to the UUID standard RFC 4122, UUIDs should be treated as unsigned int 128 and comparison should be done that way:

Rules for Lexical Equivalence:
      Consider each field of the UUID to be an unsigned integer as shown
      in the table in section Section 4.1.2.  Then, to compare a pair of
      UUIDs, arithmetically compare the corresponding fields from each
      UUID in order of significance and according to their data type.
      Two UUIDs are equal if and only if all the corresponding fields
      are equal.
 UUIDs, as defined in this document, can also be ordered
      lexicographically.  For a pair of UUIDs, the first one follows the
      second if the most significant field in which the UUIDs differ is
      greater for the first UUID.  The second precedes the first if the
      most significant field in which the UUIDs differ is greater for
      the second UUID.

Meaning that println(UUID.fromString("b533260f-6479-4014-a007-818481bd98c6") < UUID.fromString("131f0ada-6b6a-4e75-a6a0-4149958664e3")) should print false.

However, it prints true!!

Looking at the implementation of compareTo (I'm using Temurin here),

@Override
    public int compareTo(UUID val) {
        // The ordering is intentionally set up so that the UUIDs
        // can simply be numerically compared as two numbers
        int mostSigBits = Longpare(this.mostSigBits, val.mostSigBits);
        return mostSigBits != 0 ? mostSigBits : Longpare(this.leastSigBits, val.leastSigBits);
    }

For this particular case, the most significant bits are

-5389922481480318956
1377831944219938421

respectively. Meaning that comparison is wrong because the long overflows.

According to the UUID standard RFC 4122, UUIDs should be treated as unsigned int 128 and comparison should be done that way:

Rules for Lexical Equivalence:
      Consider each field of the UUID to be an unsigned integer as shown
      in the table in section Section 4.1.2.  Then, to compare a pair of
      UUIDs, arithmetically compare the corresponding fields from each
      UUID in order of significance and according to their data type.
      Two UUIDs are equal if and only if all the corresponding fields
      are equal.
 UUIDs, as defined in this document, can also be ordered
      lexicographically.  For a pair of UUIDs, the first one follows the
      second if the most significant field in which the UUIDs differ is
      greater for the first UUID.  The second precedes the first if the
      most significant field in which the UUIDs differ is greater for
      the second UUID.

Meaning that println(UUID.fromString("b533260f-6479-4014-a007-818481bd98c6") < UUID.fromString("131f0ada-6b6a-4e75-a6a0-4149958664e3")) should print false.

However, it prints true!!

Looking at the implementation of compareTo (I'm using Temurin here),

@Override
    public int compareTo(UUID val) {
        // The ordering is intentionally set up so that the UUIDs
        // can simply be numerically compared as two numbers
        int mostSigBits = Longpare(this.mostSigBits, val.mostSigBits);
        return mostSigBits != 0 ? mostSigBits : Longpare(this.leastSigBits, val.leastSigBits);
    }

For this particular case, the most significant bits are

-5389922481480318956
1377831944219938421

respectively. Meaning that comparison is wrong because the long overflows.

Share Improve this question edited Mar 8 at 22:49 Peter Mortensen 31.6k22 gold badges110 silver badges133 bronze badges asked Mar 6 at 15:43 Gabriel FurstenheimGabriel Furstenheim 3,68837 silver badges39 bronze badges 6
  • 1 btw, before anyone points out that the println statement above is not valid Java, it's valid Kotlin/JVM. – k314159 Commented Mar 6 at 16:09
  • 4 Could someone clarify why the principle demonstration for this Java issue (as mentioned in title and tag) is not valid Java - if its Kotlin then identify it as such or use a valid Java example. (Operator "<" is not defined ...) – Computable Commented Mar 6 at 16:18
  • Note that, in 2024, RFC 9562 invented Version 6 and Version 7 of UUIDs. These versions define an arrangement of bits intended to be sortable in a practical manner for optimizing index lookups when used as a key in databases etc. See Wikipedia. Semantically a UUID should still not be considered “greater or less than”, but practically v6 & v7 are designed to be comparable. – Basil Bourque Commented Mar 6 at 16:19
  • 5 To be extremely pedantic, the Java implementation is not completely wrong: the current version of the UUID class only recognizes variants 1, 2, 3 and 4. The UUID standard you mention above, RFC 4122, is obsolete, replaced by RFC 9562. The latest standard defines a lexicographic order only for variants 6 and 7. – k314159 Commented Mar 6 at 16:41
  • 3 This question is similar to: Why is UUID#compareTo incompatible with RFC 4122?. If you believe it’s different, please edit the question, make it clear how it’s different and/or how the answers on that question are not helpful for your problem. – user71659 Commented Mar 7 at 20:27
 |  Show 1 more comment

2 Answers 2

Reset to default 35

Is UUID comparison in Java against the standard?

Yes. It is reported as the bug JDK-7025832 - "java.lang.UUID compareTo() does not do an unsigned compare".

It has been marked as "won't fix" because it is very important that the behaviour of compareTo is consistent across Java versions.


It's simple to write a Comparator that compares them in an unsigned way if you specifically want that behaviour. Just use LongpareUnsigned.

Comparator<UUID> rfcComparator = (a, b) -> {
    int mostSigBits = LongpareUnsigned(a.getMostSignificantBits(), b.getMostSignificantBits());
    return mostSigBits != 0 ? mostSigBits : LongpareUnsigned(a.getLeastSignificantBits(), b.getLeastSignificantBits());
};

I think Java can be excused for not implementing comparison according to that document; the stated rule has a fatal flaw.

In particular "the first one follows the second" and "the second precedes the first" both mean the same thing¹; according to the rule the only possible outcome is A > B (and the rule implicitly allows A == B) but never is A < B.


¹ The direct mathematical interpretations are A > B and B < A, respectively.

转载请注明原文地址:http://conceptsofalgorithm.com/Algorithm/1744965205a277155.html

最新回复(0)