Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Remove related row from pandas dataframe

I have the following dataframe:

id relatedId coordinate
123 125 55
125 123 45
128 130 60
132 135 50
130 128 40
135 132 50

So I have 6 rows in this dataframe, but I would like to get rid of the related rows resulting in 3 rows. The coordinate column equals 100 between the two related rows, and I would like to keep the one with the lowest value (so the one less than 50. If both are 50, simply one of them). The resulting dataframe would thus be:

id relatedId coordinate
125 123 45
132 135 50
130 128 40

Hopefully someone has a good solution for this problem.
Thanks

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

>Solution :

You can sort the values and get the first value per group using a frozenset of the 2 ids as grouper:

(df
 .sort_values(by='coordinate')
 .groupby(df[['id', 'relatedId']].agg(frozenset, axis=1), as_index=False)
 .first()
)

output:

    id  relatedId  coordinate
0  130        128          40
1  125        123          45
2  132        135          50

Alternatively, to keep the original order, and original indices, use idxmin per group:

group = df[['id', 'relatedId']].agg(frozenset, axis=1)
idx = df['coordinate'].groupby(group).idxmin()
df.loc[sorted(idx)]

output:

    id  relatedId  coordinate
1  125        123          45
3  132        135          50
4  130        128          40
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading