Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

How to get the `sort` shell command to compare raw bytes?

It seems like the posix sort
command line utility will do some fancy locale based shenanegans to compare the given strings.

I scanned the man page but could not seem to find a way to get it to use the raw byte values instead.
Is there a way to get sort (I have the GNU coreutils version) to behave like
qsort(array_of_my_strings, N, strcmp) would in C? Solutions using another tool then sort would be fine too.

For demonstration, I currently get:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

printf "\xC3\xBC\n\x76\n" | sort
ü
v

because the german umlaut ü seems to be compared as u which comes before v, despite \xC3 being larger than \x76.

What i want is

printf "\xC3\xBC\n\x76\n" | sort --raw-bytes-please
v
ü

>Solution :

Collation order and (multi-byte) character type are influenced by your locale. The locale name for disabling multibyte and locale-aware behaviors is C.

Thus:

LC_COLLATE=C LC_CTYPE=C sort

…will set only the character type and the collation order (assuming LC_ALL isn’t set, in which case they would be ignored).


As a big hammer, you can also use:

LC_ALL=C sort

albeit with side effects such as changing the language used for printing error messages &c to the strings originally written by sort‘s developers with no translation tables in effect.

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading