I am looking to figure out how I can filter rows in pandas based on a certain criteria that changes for each group. See dummy data here and the desired result.
Dummy data
data = {'ID':['1111', '1111', '1111', '1111','1112','1112','1112','1112','1112'],
'Category':[1,2,2,2,1,3,2,2,1]}
pd.DataFrame(data)
ID Category
1111 1
1111 2
1111 2
1111 2
1112 1
1112 3
1112 2
1112 2
1112 1
Now I am trying to filter the dataframe such that for each ID it picks all the rows before the first occurrence of Category 2. This is how the desired result would look like
ID Category
1111 1
1112 1
1112 3
>Solution :
Try this:
df[(df['Category'] != 2).groupby(df['ID']).cumprod()]
Output:
ID Category
0 1111 1
4 1112 1
5 1112 3
Details:
Create a boolean series where True when Category is not equal to 2, then use cumprod such that the first False Category = 2, it remains False for each group.