I have a dataframe such as :
The_list=["A","B"]
Groups COL2
G1 A
G1 B
G1 C
G2 A
G2 D
G3 A
G3 B
G4 A
G5 B
and I would like to add another COL3 column where I put yes for each Groups where all element from The_list are present within the COL2, and no if not.
I should then get :
Groups COL2 COL3
G1 A yes
G1 B yes
G1 C yes
G2 A no
G2 D no
G3 A yes
G3 B yes
G4 A no
G5 B no
>Solution :
Use Groupby.apply with set intersection and np.where:
# Group on column 'Groups' and apply set intersection and find length of set
In [1178]: out = df.groupby('Groups')['COL2'].apply(lambda x: len(set(x) & set(The_list))).reset_index()
# If length of set is equal to the length of the list, then 'yes' else 'no'
In [1179]: import numpy as np
In [1180]: out['COL3'] = np.where(out.COL2.eq(len(The_list), 'yes', 'no')
# Merge original df with above and populate COL3
In [1185]: df = df.merge(out[['Groups','COL3']])
In [1186]: df
Out[1186]:
Groups COL2 COL3
0 G1 A yes
1 G1 B yes
2 G1 C yes
3 G2 A no
4 G2 D no
5 G3 A yes
6 G3 B yes
7 G4 A no
8 G5 B no