I have a dataframe such as :
The_list=["A","B","D"]
Groups Values
G1 A
G1 B
G1 C
G1 D
G2 A
G2 B
G2 A
G2 D
G3 A
G3 D
G4 Z
G4 D
G4 E
G4 C
G5 A
G5 B
G5 D
And I would like only to subset Groups where Values element are all within The_list, and that not other element are within that group that are not present within The_list.
Here I should only subset then :
Groups Values
G2 A
G2 B
G2 A
G2 D
G5 A
G5 B
G5 D
So far I tried :
df.loc[df.Values.str.contains["A" & "B" & "D"].groupby(df.Groups)]
>Solution :
You can use pandas.groupby on column Groups then check set each values of group with set the_list and return all rows that groups have True values:
The_list=["A","B","D"]
mask_rows = df.groupby('Groups')['Values'].transform(
lambda x : set(x) == set(The_list)
)
print(df[mask_rows])
Output:
Groups Values
G2 A
G2 B
G2 A
G2 D
G5 A
G5 B
G5 D