Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Pandas merge multiple rows based on two columns

I have a dataframe shown below

     pid          date      label   age
0   0001    2001-09-24      False   34
1   0001    2006-03-02      True    39
2   0001    2006-03-02      True    39
3   0002    2003-02-07      True    23
4   0002    2004-08-02      True    24
5   0002    2004-08-02      False   24
6   0003    2001-05-25      False   25
7   0003    2001-05-25      False   25

Which I want to aggregate based on both pid and date, where if multiple rows have the same pid and data then age stays the same and label = True if any or all of these rows have True label:

     pid          date      label   age
0   0001    2001-09-24      False   34
1   0001    2006-03-02      True    39
3   0002    2003-02-07      True    23
4   0002    2004-08-02      True    24
5   0003    2001-05-25      False   25

I have tried groupby but all I could do is with one column – pid and no success with two columns.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

>Solution :

In your case sort_values + drop_duplicates

out = df.sort_values('label').drop_duplicates(['pid','date','age'],keep='last').sort_index()
Out[240]: 
   pid        date  label  age
0    1  2001-09-24  False   34
2    1  2006-03-02   True   39
3    2  2003-02-07   True   23
4    2  2004-08-02   True   24
7    3  2001-05-25  False   25
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading