Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

pandas dataframe custom sort function

I am sorry if it has been explained before, however, I still can’t find a proper solution.
I have this short code with pandas.DataFrame:

import pandas as pd


table = {
    "key4": ["key3", "command4"],
    "key2": ["key1", "command2"],
    "key3": ["cron3", "command3"],
    "key5": ["cron5", "command5"],
    "key1": ["cron1", "command1"]   
}

columns = ["trigger", "command"]
df = pd.DataFrame.from_dict(table, orient='index', columns=columns)

I would like to order each row under its parent, in other words if key1 is found in the trigger column it should be ordered right after the row with the name key1. (I only expect a single occurrance of key1 both as a name and as a value for a trigger).
Or is it an overkill and should I try using a different format?
So the output of print(df) should look like this:

     trigger    command

key3   cron3    command3
key4    key3    command4
key5   cron5    command5
key1   cron1    command1
key2    key1    command2

Can I somehow import a function into df.sort_values() somehow so that this sorting were more customized? Thank you in advance.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

>Solution :

You can use masks and numpy.lexsort:

import numpy as np

m = df['trigger'].isin(df.index)
key = df['trigger'].where(m, df.index)

out = df.iloc[np.lexsort([m, key])]

Or in pure pandas (less elegant IMO, but more flexible if you need a custom order; see below):

out = (df
   .assign(m=df['trigger'].isin(df.index),
           key=lambda d: d['trigger'].where(m, d.index))
   .sort_values(by=['key', 'm'])
   .drop(columns=['m', 'key'])
)

Output:

     trigger   command
key1   cron1  command1
key2    key1  command2
key3   cron3  command3
key4    key3  command4
key5   cron5  command5

Intermediates:

# before sorting
     trigger   command      m   key
key4    key3  command4   True  key3
key2    key1  command2   True  key1
key3   cron3  command3  False  key3
key5   cron5  command5  False  key5
key1   cron1  command1  False  key1

# after sorting
     trigger   command      m   key
key1   cron1  command1  False  key1
key2    key1  command2   True  key1
key3   cron3  command3  False  key3
key4    key3  command4   True  key3
key5   cron5  command5  False  key5

If you want to keep the original order of the groups (key3 -> key5 -> key1):

out = (df
   .assign(m=df['trigger'].isin(df.index),
           key=lambda d: pd.Categorical(d['trigger'].where(m, d.index),
                                       categories=d.index[~d['m']].unique(),
                                        ordered=True)
          )
   .sort_values(by=['key', 'm'])
   .drop(columns=['m', 'key'])
)

Variant:

m = df['trigger'].isin(df.index)
key = df['trigger'].where(m, df.index)
order = df.index[~m].unique()

tmp = df.assign(key=pd.Categorical(key, categories=order, ordered=True))

out = (pd.concat([tmp[~m], tmp[m]]).sort_values(by='key', kind='stable')
         .drop(columns='key')
      )

Output:

     trigger   command
key3   cron3  command3
key4    key3  command4
key5   cron5  command5
key1   cron1  command1
key2    key1  command2
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading