I need to make sure all rows per id are equal to red to distinguish id 1 from id 2:
import pandas as pd
data = {"id":[1,1,1,2,2,2],
"status":["red","red","red","green","red","red"]}
df= pd.DataFrame(data)
# id status
# 1 red
# 1 red
# 1 red
# 2 green
# 2 red
# 2 red
df[df.loc[df["status"]=="red"].groupby("id").size() / df.groupby("id").status.size()==1]
#IndexingError: Unalignable boolean Series provided as indexer (index of the boolean Series and of the indexed object do not match).
#It is enough if the output is a series/list/array of the ids where all rows are red.
>Solution :
To get all the IDs with only "red", you can use groupby.transform:
out = df[df['status'].eq('red').groupby(df['id']).transform('all')]
Variant:
keep = df['status'].eq('red').groupby(df['id']).all()
out = df[df['id'].isin(keep[keep].index)]
Output:
id status
0 1 red
1 1 red
2 1 red
Intermediates:
df['is_red'] = df['status'].eq('red')
df['all_red'] = df.groupby('id')['is_red'].transform('all')
id status is_red all_red
0 1 red True True
1 1 red True True
2 1 red True True
3 2 green False False
4 2 red True False
5 2 red True False