Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Sort column strings without numbers (and keep order when doing graphs)

I have this df code

df = pd.DataFrame({'A': ['0-5', '18-23', '12-17', '6-11'], 'qty':[7,15,8,34]})

yielding

    A       qty
0   0-5     7
1   18-23   15
2   12-17   8
3   6-11    34

I would like to order the df by col ‘A’ without having to number the A column, so that later when I do graphs I don’t have the numbers.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

This is the desired output after sorting the df by column A:

     A        qty
0   0-5        7
3   6-11       34
2   12-17      8
1   18-23      15

To achieve a similar result I would:

# add a category code
df['A'] = df['A'].astype('category').cat.codes + 1
# convert format
df['A'] = df['A'].astype('string')
# use a dictionary to rename (based on former output)
dic = {
'1':'1_0-5',
'3':'3_18-23',        
'2':'2_12-17', 
'4':'4_6-11',    
}
df['A'] = df['A'].replace(dic, regex=True)
## use a dictionary to rename again
dic = {
'1_0-5':'1_0-5',
'3_18-23':'4_18-23',        
'2_12-17':'3_12-17', 
'4_6-11':'2_6-11',    
}
df['A'] = df['A'].replace(dic, regex=True)

by doing this, I can achieve this:

           A         qty
    0   1_0-5        7
    1   2_6-11       15
    2   3_12-17      8
    3   4_18-23      34

Groupby does not work for me, while it would order column A as desired, when I would do graphs, order would not be kept.

>Solution :

If possible sort by first integer value is possible use key parameter in DataFrame.sort_values:

out = df.sort_values('A', 
                    key=lambda x: x.str.extract('(\d+)', expand=False).astype(int),
                    ignore_index=True)
print (out)
       A  qty
0    0-5    7
1   6-11   15
2  12-17    8
3  18-23   34

Or use natural sorting:

from natsort import natsorted

out = df.sort_values("A",key=natsorted, ignore_index=True)
print (out)
       A  qty
0    0-5    7
1   6-11    8
2  12-17   34
3  18-23   15

EDIT: If need sorting by strings use ordered Categoricals:

df = pd.DataFrame({'A': ['mike','alice', 'john','brian'], 'qty':[7,15,8,34]})



df['A'] = (pd.Categorical(df['A'], 
                         categories=['john','alice', 'mike','brian'], 
                         ordered=True))

out = df.sort_values('A', ignore_index=True)
print (out)
       A  qty
0   john    8
1  alice   15
2   mike    7
3  brian   34
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading