Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Filter Groupby if contains specific values

Say I have the following dataframe covering hundreds of different names:

df = pd.DataFrame({"NAME":["A", "A", "A" ,"A", 
                       "B", "B", "B", 
                       "C", "C", "C", "C",
                       "D", "D",],
               "CLASS":["LI","BO","EQ","AI",
                           "LI", "EQ", "AI", 
                           "LI","BO","EQ","AI",
                           "EQ", "AI", 
                          ]})

Using the groupby-function of Pandas, how am I supposed to filter out only those names that contain both EQ AND AI but NOT BO? Ideally, I should obtain the following:

NAME CLASS
B    LI
B    EQ
B    AI
D    EQ
D    AI

Thanks a lot for any suggestions in advance!

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

>Solution :

Use groupby.transform with set operations:

out = df[df.groupby('NAME')['CLASS']
           .transform(lambda x: ((S:=set(x))>={'EQ', 'AI'}) and ('BO' not in S))]
# or
# out = df[df.groupby('NAME')['CLASS']
#            .transform(lambda x: ((S:=set(x))>={'EQ', 'AI'})
#                       and not S.intersection({'BO'}))]

Variant with groupby.agg and isin:

g = df.groupby('NAME')['CLASS'].agg(set)
out = df[df['NAME'].isin(g.index[(g >= {'EQ', 'AI'}) & ~(g >= {'BO'})])]

Output:

   NAME CLASS
4     B    LI
5     B    EQ
6     B    AI
11    D    EQ
12    D    AI
Intermediates

with an extra group for completeness

# transform approach
   NAME CLASS  g >= {'EQ', 'AI'}  ~(g >= {'BO'})      &
0     A    LI               True           False  False
1     A    BO               True           False  False
2     A    EQ               True           False  False
3     A    AI               True           False  False
4     B    LI               True            True   True
5     B    EQ               True            True   True
6     B    AI               True            True   True
7     C    LI               True           False  False
8     C    BO               True           False  False
9     C    EQ               True           False  False
10    C    AI               True           False  False
11    D    EQ               True            True   True
12    D    AI               True            True   True
13    E    AI              False            True  False

# agg approach
                     g  (g >= {'EQ', 'AI'})  ~(g >= {'BO'})      &
NAME                                                              
A     {LI, EQ, BO, AI}                 True           False  False
B         {LI, EQ, AI}                 True            True   True
C     {LI, EQ, BO, AI}                 True           False  False
D             {EQ, AI}                 True            True   True
E                 {AI}                False            True  False
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading