I have this df code
df = pd.DataFrame({'A': ['0-5', '18-23', '12-17', '6-11'], 'qty':[7,15,8,34]})
yielding
A qty
0 0-5 7
1 18-23 15
2 12-17 8
3 6-11 34
I would like to order the df by col ‘A’ without having to number the A column, so that later when I do graphs I don’t have the numbers.
This is the desired output after sorting the df by column A:
A qty
0 0-5 7
3 6-11 34
2 12-17 8
1 18-23 15
To achieve a similar result I would:
# add a category code
df['A'] = df['A'].astype('category').cat.codes + 1
# convert format
df['A'] = df['A'].astype('string')
# use a dictionary to rename (based on former output)
dic = {
'1':'1_0-5',
'3':'3_18-23',
'2':'2_12-17',
'4':'4_6-11',
}
df['A'] = df['A'].replace(dic, regex=True)
## use a dictionary to rename again
dic = {
'1_0-5':'1_0-5',
'3_18-23':'4_18-23',
'2_12-17':'3_12-17',
'4_6-11':'2_6-11',
}
df['A'] = df['A'].replace(dic, regex=True)
by doing this, I can achieve this:
A qty
0 1_0-5 7
1 2_6-11 15
2 3_12-17 8
3 4_18-23 34
Groupby does not work for me, while it would order column A as desired, when I would do graphs, order would not be kept.
>Solution :
If possible sort by first integer value is possible use key parameter in DataFrame.sort_values:
out = df.sort_values('A',
key=lambda x: x.str.extract('(\d+)', expand=False).astype(int),
ignore_index=True)
print (out)
A qty
0 0-5 7
1 6-11 15
2 12-17 8
3 18-23 34
Or use natural sorting:
from natsort import natsorted
out = df.sort_values("A",key=natsorted, ignore_index=True)
print (out)
A qty
0 0-5 7
1 6-11 8
2 12-17 34
3 18-23 15
EDIT: If need sorting by strings use ordered Categoricals:
df = pd.DataFrame({'A': ['mike','alice', 'john','brian'], 'qty':[7,15,8,34]})
df['A'] = (pd.Categorical(df['A'],
categories=['john','alice', 'mike','brian'],
ordered=True))
out = df.sort_values('A', ignore_index=True)
print (out)
A qty
0 john 8
1 alice 15
2 mike 7
3 brian 34