Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Filter groupby object: select all rows associated with customers satisfying certain criteria

In the fictitious data here, I want to identify any customers who did not purchase alcoholic beverages (specifically "Beer" or "Wine" in example) across all visits combined.

df = pd.DataFrame({"customer": ['A55', 'A55', 'A55', 
                       'B080', 'B080', 
                       'D900', 'D900', 
                       'Z900', 'Z900', 'Z900', 'Z900'],
               "date":['01/11/2016', '01/21/2016', '02/11/2016', 
                       '08/17/2016', '6/17/2016',
                       '03/01/2016','04/30/2016', 
                       '05/16/2016','09/27/2016', '10/05/2016', '11/11/2016'],
               "item": ['Beer', 'Beer', 'Wine', 
                        'Coffee', 'Gatorade', 
                        'Wine', 'Coffee', 
                        'Beer', 'Wine', 'Beer', 'Coffee']
})

    ##Tag visit number for each customer
df["customer_visit_number"] = df.groupby("customer").cumcount().add(1)

In the dataframe example below, only customer B080 never bought any Beer or Wine. By contrast the others bought Beer or Wine on occasion or all visits. To tag customer visits I added a cumcount() operation but not all customers visit the store same number of times.

Is there a simple way in pandas to filter all rows associated with this customer only after doing the groupby operation at customer level?

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

See dataframe belowenter image description here

>Solution :

Identify customers that purchased alcoholic beverages:

drop = (df['item'].isin(['Beer', 'Wine'])
        .groupby(df['customer']).any()
       )

Filter with boolean indexing:

out = df[~df['customer'].isin(drop[drop].index)]

Or with groupby.transform to directly obtain a Series of booleans for boolean indexing:

m = (df['item'].isin(['Beer', 'Wine'])
     .groupby(df['customer'])
     .transform('any')
    )
out = df[~m]

Output:

  customer        date      item
3     B080  08/17/2016    Coffee
4     B080   6/17/2016  Gatorade

Intermediates:

# drop
customer
A55      True
B080    False
D900     True
Z900     True
Name: item, dtype: bool

# m
0      True
1      True
2      True
3     False
4     False
5      True
6      True
7      True
8      True
9      True
10     True
Name: item, dtype: bool
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading