I am sorry if it has been explained before, however, I still can’t find a proper solution.
I have this short code with pandas.DataFrame:
import pandas as pd
table = {
"key4": ["key3", "command4"],
"key2": ["key1", "command2"],
"key3": ["cron3", "command3"],
"key5": ["cron5", "command5"],
"key1": ["cron1", "command1"]
}
columns = ["trigger", "command"]
df = pd.DataFrame.from_dict(table, orient='index', columns=columns)
I would like to order each row under its parent, in other words if key1 is found in the trigger column it should be ordered right after the row with the name key1. (I only expect a single occurrance of key1 both as a name and as a value for a trigger).
Or is it an overkill and should I try using a different format?
So the output of print(df) should look like this:
trigger command
key3 cron3 command3
key4 key3 command4
key5 cron5 command5
key1 cron1 command1
key2 key1 command2
Can I somehow import a function into df.sort_values() somehow so that this sorting were more customized? Thank you in advance.
>Solution :
You can use masks and numpy.lexsort:
import numpy as np
m = df['trigger'].isin(df.index)
key = df['trigger'].where(m, df.index)
out = df.iloc[np.lexsort([m, key])]
Or in pure pandas (less elegant IMO, but more flexible if you need a custom order; see below):
out = (df
.assign(m=df['trigger'].isin(df.index),
key=lambda d: d['trigger'].where(m, d.index))
.sort_values(by=['key', 'm'])
.drop(columns=['m', 'key'])
)
Output:
trigger command
key1 cron1 command1
key2 key1 command2
key3 cron3 command3
key4 key3 command4
key5 cron5 command5
Intermediates:
# before sorting
trigger command m key
key4 key3 command4 True key3
key2 key1 command2 True key1
key3 cron3 command3 False key3
key5 cron5 command5 False key5
key1 cron1 command1 False key1
# after sorting
trigger command m key
key1 cron1 command1 False key1
key2 key1 command2 True key1
key3 cron3 command3 False key3
key4 key3 command4 True key3
key5 cron5 command5 False key5
If you want to keep the original order of the groups (key3 -> key5 -> key1):
out = (df
.assign(m=df['trigger'].isin(df.index),
key=lambda d: pd.Categorical(d['trigger'].where(m, d.index),
categories=d.index[~d['m']].unique(),
ordered=True)
)
.sort_values(by=['key', 'm'])
.drop(columns=['m', 'key'])
)
Variant:
m = df['trigger'].isin(df.index)
key = df['trigger'].where(m, df.index)
order = df.index[~m].unique()
tmp = df.assign(key=pd.Categorical(key, categories=order, ordered=True))
out = (pd.concat([tmp[~m], tmp[m]]).sort_values(by='key', kind='stable')
.drop(columns='key')
)
Output:
trigger command
key3 cron3 command3
key4 key3 command4
key5 cron5 command5
key1 cron1 command1
key2 key1 command2