Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

I'd like to group by two columns and then print out rows corresponding to minimum value on the third column

Here is an example of two dataframes in pandas:

 df = pd.DataFrame({0: [1.5, 2, 2.3, 4], 1: [3, 6, 1, 0]})
 df2 = pd.DataFrame({0: [1.7, 4.05, 2.1, 2.99], 1: [1, 3, 1, 7]})

df.columns = ["x1", "y1"]
df2.columns = ["x2", "y2"]

Then I merge them and make another dataframe which consists only of x1 and x2 columns that are closest to each other.

merged1 = df.merge(df2, how='cross')
merged1['diff'] = (merged1['x1'].sub(merged1['x2'])).abs()

out1 = (merged1.loc[merged1.groupby(df.columns.tolist())['diff'].idxmin().to_numpy()])

And lastly, I’d like to group by column y2 and print out the row of each group that has the smallest value of the column diff

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

out1.groupby(["y2"])['diff'].min()

But it prints out all the rows:

    x1  y1    x2  y2  diff
0   1.5   3  1.70   1  0.20
6   2.0   6  2.10   1  0.10
10  2.3   1  2.10   1  0.20
13  4.0   0  4.05   3  0.05

What’s the problem?

>Solution :

You need to use:

out1.loc[out1.groupby(["y2"])['diff'].idxmin()]

output:

     x1  y1    x2  y2  diff
6   2.0   6  2.10   1  0.10
13  4.0   0  4.05   3  0.05

Note that you could do the whole thing with just 2 commands (using pandas.merge_asof):

(pd
 .merge_asof(df, df2.sort_values(by='x2'),
             left_on='x1', right_on='x2', direction='nearest')
 .loc[lambda d: d['x1'].sub(d['x2']).abs().groupby(d['y2']).idxmin()]
)

or, to also have the "diff" column:

(pd
 .merge_asof(df, df2.sort_values(by='x2'),
             left_on='x1', right_on='x2', direction='nearest')
 .assign(diff=lambda d: d['x1'].sub(d['x2']).abs())
 .loc[lambda d: d.groupby('y2')['diff'].idxmin()]
)
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading