Sort dataframe based on minimum value of two columns

Let’s assume I have the following dataframe:

import pandas as pd
d = {'col1': [1, 2,3,4], 'col2': [4, 2, 1, 3], 'col3': [1,0,1,1], 'outcome': [1,0,1,0]}
df = pd.DataFrame(data=d)

I want this dataframe sorted by col1 and col2 on the minimum value. The order of the indexes should be 2, 0, 1, 3.

I tried this with df.sort_values(by=['col2', 'col1']), but than it takes the minimum of col1 first and then of col2. Is there anyway to order by taking the minimum of two columns?

>Solution :

Using numpy.lexsort:

order = np.lexsort(np.sort(df[['col1', 'col2']])[:, ::-1].T)

out = df.iloc[order]

Output:

   col1  col2  col3  outcome
2     3     1     1        1
0     1     4     1        1
1     2     2     0        0
3     4     3     1        0

Note that you can easily handle any number of columns:

df.iloc[np.lexsort(np.sort(df[['col1', 'col2', 'col3']])[:, ::-1].T)]

   col1  col2  col3  outcome
1     2     2     0        0
2     3     1     1        1
0     1     4     1        1
3     4     3     1        0

Leave a Reply