Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

move Pandas row to end based on condition

I have a pandas dataframe with several columns. That dataframe is sorted based on values in one of the columns. However there are some rows which need to go to the bottom based on a different condition. The column where this second condition applies looks something like this:

Superheros
Spiderman (Nr 1)
Batman (Nr 4)
Joker
Iron Man (Nr 2)
Hulk
Captain America
Wonderwoman (Nr 3)

I need all rows containing (Nr #) to move to the bottom in a sorted way, leaving the order of the other rows untouched, so that it results in this:

Superheros
Joker
Hulk
Captain America
Spiderman (Nr 1)
Iron Man (Nr 2)
Wonderwoman (Nr 3)
Batman (Nr 4)

I found this question about moving pandas rows however, I don’t think I can use this, or at least I would not know how to change it so that it does what I want.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

I was thinking of creating a new column that contains the numbers in the brackets, and 0 for every hero that doesn’t have a number, sort, split dataframe and add them together again, but this seems like a detour.

I was also thinking of finding the Nr 1,2,3,… one by one and moving them to the end, like this:

index = df["Superheros"].where(df["Superheros"].str.contains("Nr 1"))
df["Superheros"][index].move_to[-1]

but I can’t seem to find this "move_to" command – so not sure if it might not exist? I am sure that I am missing something very obviouse…

>Solution :

You can extract the number, sort_values with na_position='first' parameter, and use this to reindex the original DataFrame:

s = pd.to_numeric(df['Superheros'].str.extract('\(Nr (\d+)\)', expand=False))

idx = s.sort_values(na_position='first').index

out = df.loc[idx]

Or, same logic using the key parameter of sort_values:

sorter = lambda s: pd.to_numeric(s.str.extract('\(Nr (\d+)\)', expand=False))

out = df.sort_values(by='Superheros', key=sorter, na_position='first')

output:

           Superheros
2               Joker
4                Hulk
5     Captain America
0    Spiderman (Nr 1)
3     Iron Man (Nr 2)
6  Wonderwoman (Nr 3)
1       Batman (Nr 4)

Intermediate:

sorter(df['Superheros'])

0    1.0
1    4.0
2    NaN
3    2.0
4    NaN
5    NaN
6    3.0
Name: Superheros, dtype: float64
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading