Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Sorting a data frame with conditions

Having this data frame

df = pd.DataFrame({'id':[3333311,3455572,6464544,2323322,2222111,4333311,5454566,3321767],'A':['Na','Na','Na','Na','Na','Na','56 days','54 days'],'B':['6 days','31 days','33 days','46 days','44 days','16 days','41 days','42 days'],'Percentage':[0.41,0.36,0.36,0.42,0.25,0.56,0.25,0.42]})

print(df)

id        A        B  Percentage
0  3333311            6 days        0.41
1  3455572           31 days        0.36
2  6464544           33 days        0.36
3  2323322           46 days        0.42
4  2222111           44 days        0.25
5  4333311           16 days        0.56
6  5454566  56 days  41 days        0.25
7  3321767  54 days  42 days        0.42

I would like to sort values based on percentage but I would like also to put rows with values in A sorted first and then the rows sorted without data in A. Like this.

id        A        B  Percentage
7  3321767  54 days  42 days        0.42
6  5454566  56 days  41 days        0.25
5  4333311           16 days        0.56
3  2323322           46 days        0.42
0  3333311            6 days        0.41
1  3455572           31 days        0.36
2  6464544           33 days        0.36
4  2222111           44 days        0.25

EDIT: I have also found that the column percentages have some Na values. I would like columns with Na at the end of the data frame in both subgroups. I know how to do this in a simple sorting task. I mention it here in case, it is needed

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

>Solution :

By default is na_position='last' in DataFrame.sort_values, so after converting to timedeltas sorting by both columns, first ascending, second descending:

df['A'] = pd.to_timedelta(df['A'], errors='coerce')

df = df.sort_values(['A','Percentage'], ascending=[True, False])
print (df)
        id       A       B  Percentage
7  3321767 54 days 42 days        0.42
6  5454566 56 days 41 days        0.25
5  4333311     NaT 16 days        0.56
3  2323322     NaT 46 days        0.42
0  3333311     NaT  6 days        0.41
1  3455572     NaT 31 days        0.36
2  6464544     NaT 33 days        0.36
4  2222111     NaT 44 days        0.25
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading