Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Add yes if within a group, a column contains all element from a list in python

I have a dataframe such as :

The_list=["A","B"]
Groups COL2
G1     A
G1     B
G1     C
G2     A
G2     D
G3     A
G3     B
G4     A
G5     B

and I would like to add another COL3 column where I put yes for each Groups where all element from The_list are present within the COL2, and no if not.

I should then get :

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

Groups COL2 COL3
G1     A    yes
G1     B    yes
G1     C    yes
G2     A    no
G2     D    no
G3     A    yes
G3     B    yes
G4     A    no
G5     B    no

>Solution :

Use Groupby.apply with set intersection and np.where:

# Group on column 'Groups' and apply set intersection and find length of set
In [1178]: out = df.groupby('Groups')['COL2'].apply(lambda x: len(set(x) & set(The_list))).reset_index()

# If length of set is equal to the length of the list, then 'yes' else 'no'
In [1179]: import numpy as np
In [1180]: out['COL3'] = np.where(out.COL2.eq(len(The_list), 'yes', 'no')

# Merge original df with above and populate COL3
In [1185]: df = df.merge(out[['Groups','COL3']])

In [1186]: df
Out[1186]: 
  Groups COL2 COL3
0     G1    A  yes
1     G1    B  yes
2     G1    C  yes
3     G2    A   no
4     G2    D   no
5     G3    A  yes
6     G3    B  yes
7     G4    A   no
8     G5    B   no
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading