Previous: Additional hard-coded priorities in GNU coreutils' version sort, Up: Differences from the official Debian Algorithm [Contents][Index]
GNU coreutils’ version sort algorithm implements specialized handling of file extensions (or strings that look like file names with extensions).
This nuanced implementation enables slightly more natural ordering of files.
The additional rules are:
(\.[A-Za-z~][A-Za-z0-9~]*)*
).Examples for rule 1:
hello-8.txt
: the suffix is .txt
hello-8.2.txt
: the suffix is .txt
(‘.2
’ is not included because the dot is not followed by a letter)hello-8.0.12.tar.gz
: the suffix is .tar.gz
(‘.0.12
’ is not included)hello-8.2
: no suffix (suffix is an empty string)hello.foobar65
: the suffix is .foobar65
gcc-c++-10.8.12-0.7rc2.fc9.tar.bz2
: the suffix is .fc9.tar.bz2
(.7rc2
is not included as it begins with a digit)Examples for rule 2:
hello-8.txt
to hello-8.2.12.txt
, the .txt
suffix is temporarily removed from both strings.foo-10.3.tar.gz
to foo-10.tar.xz
, the suffixes .tar.gz
and .tar.xz
are temporarily removed from the strings.Example for rule 3:
hello.foobar65
to hello.foobar4
, the suffixes (.foobar65
and .foobar4
) are temporarily removed. The remaining strings are identical (hello
). The suffixes are then restored, and the entire strings are compared (hello.foobar4
comes first).Examples for rule 4:
hello-8.2.txt
and hello-8.10.txt
, the suffixes (.txt
) are temporarily removed. The remaining strings (hello-8.2
and hello-8.10
) are compared as previously described (hello-8.2
comes first). (In this case the suffix removal algorithm does not have a noticeable effect on the resulting order.)How does the suffix-removal algorithm effect ordering results?
Consider the comparison of hello-8.txt and hello-8.2.txt.
Without the suffix-removal algorithm, the strings will be broken down to the following parts:
hello- vs hello- (rule 2, all non-digit characters) 8 vs 8 (rule 3, all digit characters) .txt vs . (rule 2) empty vs 2 empty vs .txt
The comparison of the third parts (‘.
’ vs
‘.txt
’) will determine that the shorter string comes first -
resulting in hello-8.2.txt
appearing first.
Indeed this is the order in which Debian’s dpkg
compares the strings.
A more natural result is that hello-8.txt
should come before
hello-8.2.txt
, and this is where the suffix-removal comes into play:
The suffixes (.txt
) are removed, and the remaining strings are
broken down into the following parts:
hello- vs hello- (rule 2, all non-digit characters) 8 vs 8 (rule 3, all digit characters) empty vs . (rule 2) empty vs 2
As empty strings sort before non-empty strings, the result is hello-8
being first.
A real-world example would be listing files such as:
gcc_10.fc9.tar.gz
and gcc_10.8.12.7rc2.fc9.tar.bz2
: Debian’s algorithm would list
gcc_10.8.12.7rc2.fc9.tar.bz2
first, while ‘ls -v
’ will list
gcc_10.fc9.tar.gz
first.
These priorities make sense for ‘ls -v
’:
Versioned files will be listed in a more natural order.
For ‘sort -V
’ these priorities might seem arbitrary. However,
because the sorting code is shared between the ls
and sort
program, the ordering rules are the same.