Version sort ignores locale (GNU Coreutils 9.0)
Previous: Tilde ‘~
’ character, Up: Implementation Details [Contents][Index]
30.2.6 Version sort uses ASCII order, ignores locale, unicode characters
In version sort, unicode characters are compared byte-by-byte according to their binary representation, ignoring their unicode value or the current locale.
Most commonly, unicode characters (e.g. Greek Small Letter Alpha U+03B1 ‘α
’) are encoded as UTF-8 bytes (e.g. ‘α
’ is encoded as UTF-8 sequence 0xCE 0xB1
). The encoding will be compared byte-by-byte, e.g. first 0xCE
(decimal value 206) then 0xB1
(decimal value 177).
$ touch aa az "a%" "aα" $ ls -1 -v aa az a% aα
Ignoring the first letter (a
) which is identical in all strings, the compared values are:
‘a
’ and ‘z
’ are letters, and sort earlier than all other non-digit characters.
Then, percent sign ‘%
’ (ASCII value 37) is compared to the first byte of the UTF-8 sequence of ‘α
’, which is 0xCE or 206). The value 37 is smaller, hence ‘a%
’ is listed before ‘aα
’.