I’m trying to sort a dataframe in R and discovered the sort order does not match the expected ascii sort order. I need to sort a dataframe in R in the same way Python sorts the data.
df = df[do.call(order, df), ] # sort by all columns
As shown here Python correctly sorts uppercase letters before lowercase letters:
$ python Python 3.7.12 | packaged by conda-forge | (default, Oct 26 2021, 06:08:53) >>> "A" < "a" True
But R sorts uppercase letters after lowercase letters:
$ R R version 3.2.0 (2015-04-16) -- "Full of Ingredients" Copyright (C) 2015 The R Foundation for Statistical Computing Platform: x86_64-unknown-linux-gnu (64-bit) Natural language support but running in an English locale > "A" < "a"  FALSE > "A" > "a"  TRUE
How can I change the R sort behavior to match the standard ascii ordering? Is there some parameter to the order function, or some configuration setting to change the sort order?
Note: this is not a distinction between case-sensitive and case-insensitive sorting — it’s worse than that — the case sensitive sorting has a non-standard order.
Different locales use different sort orders, including case rules: you probably want to use
Sys.setlocale(locale = "C"). (There is more information about locale definitions and case sorting order here.)
?Comparison says a little bit about locale-specific sorting …
The collating sequence of locales such as ‘en_US’ is
normally different from ‘C’ (which should use ASCII) and can be
… but as far as I can see does not say anything explicit about case order (searching for "case" in the page didn’t get any hits).
> Sys.getlocale()  "en_CA.UTF-8/en_CA.UTF-8/en_CA.UTF-8/C/en_CA.UTF-8/en_CA.UTF-8" > "A" < "a"  FALSE > Sys.setlocale(locale = "C")  "C/C/C/C/C/en_CA.UTF-8" > "A" < "a"  TRUE