Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Pandas: Sort dataframe correctly with German umlauts, upper/lowercase and numbers

I have this dataframe (all strings):

      to_sort data
0     Belgien   a2
1      Zürich   b2
2    dänemark   c2
3          20   d2
4         100   e2
5  Österreich   f2

I want to sort it so that German umlauts are correct, also lowercase and numbers are correct:

      to_sort data
3          20   d2
4         100   e2
0     Belgien   a2
2    dänemark   c2
5  Österreich   f2
1      Zürich   b2

Here is my code to generate the dataframe and result:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

import io, pandas as pd

t = io.StringIO("""
to_sort|data
Belgien|a2
Zürich|b2
dänemark|c2
20|d2
100|e2
Österreich|f2""")
df = pd.read_csv(t, sep='|')

df = df.sort_values(by='to_sort', key=lambda col: col.str.lower().str.normalize('NFD'))

The result is almost correct, but the numbers are sorted in the wrong order, 20 should be before 200:

      to_sort data
4         100   e2
3          20   d2
0     Belgien   a2
2    dänemark   c2
5  Österreich   f2
1      Zürich   b2

How can I fix the number sorting, while maintaining all the other characteristics?

>Solution :

Use solution from last sample data in DataFrame.sort_values:

from natsort import index_natsorted

f = lambda col: np.argsort(index_natsorted(col.str.lower().str.normalize('NFD')))
df = df.sort_values(by='to_sort', key=f )
print (df)
      to_sort data
3          20   d2
4         100   e2
0     Belgien   a2
2    dänemark   c2
5  Österreich   f2
1      Zürich   b2
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading