Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Remove rows from grouped data frames based on column values

I would like to remove from each subgroup in a data frame, the rows which satisfy certain conditions. The subgroups are grouped based on the two columns Days & ID, here’s my data frame:

df = pd.DataFrame({'Days':[5,5,5,5,6,6],
                   'ID':['A11','A11','A11','A11','B12','B12'],
                    'Country':['DE','DE','FR','US','US','US'],
                    'Car1':['BMW','Volvo','Audi','BMW','Mercedes','BMW'],
                     'Car2':['Volvo','Mercedes','BMW','Volvo','Volvo','Volvo'],
                    'Car3':['Mercedes',nan,'Volvo',nan,nan,nan]},
                       )

    Days  ID   Country    Car1      Car2      Car3
0     5  A11      DE       BMW     Volvo  Mercedes
1     5  A11      DE     Volvo  Mercedes       nan
2     5  A11      FR      Audi       BMW     Volvo
3     5  A11      US       BMW     Volvo       nan
4     6  B12      US  Mercedes     Volvo       nan
5     6  B12      US       BMW     Volvo       nan

I would like to remove the rows from each group satisfying the following three conditions:

1. Car3=nan
2. Car1=Car2 (from another row within the group)
3. Car2=Car3 (from another row within the group)

The expected data frame I would like to have:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

   Days   ID Country      Car1   Car2      Car3
0     5  A11      DE       BMW  Volvo  Mercedes
1     5  A11      FR      Audi    BMW     Volvo
2     6  B12      US  Mercedes  Volvo       nan
3     6  B12      US       BMW  Volvo       nan

>Solution :

You can check membership per groups in lambda function with Series.isin and filter invert mask by ~ in boolean indexing:

m = (df.groupby(['Days','ID'], group_keys=False)
      .apply(lambda x: x['Car1'].isin(x['Car2']) & x['Car2'].isin(x['Car3'])) & 
       df['Car3'].isna())

df = df[~m].reset_index(drop=True)
print (df)
   Days   ID Country      Car1   Car2      Car3
0     5  A11      DE       BMW  Volvo  Mercedes
1     5  A11      FR      Audi    BMW     Volvo
2     6  B12      US  Mercedes  Volvo       NaN
3     6  B12      US       BMW  Volvo       NaN
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading