I have a dataframe:
ID EM_results pa_id_1850
0 0 1
1 1 1
2 0 1
3 0 2
4 0 2
5 1 3
6 1 3
7 0 3
I want to remove all the rows where EM_results is occurring as 1 more than once per pa_id_1850 index. In this case:
ID EM_results pa_id_1850
0 0 1
1 1 1
2 0 1
3 0 2
4 0 2
I have tried something like:
grouped = df.groupby('pa_id_1850')
grouped.filter(lambda x: (x['EM_results'] == 1) <= 1)
But I can’t quite get it to work
>Solution :
Your attempt is close: filter needs to return a scalar, not a Series. One way to achieve this is to filter on the sum of EM_results values in each group, assuming these values are only ever 0 or 1:
grouped = df.groupby('pa_id_1850')
grouped.filter(lambda x: (x['EM_results'].sum()) <= 1)
# Output:
ID EM_results pa_id_1850
0 0 0 1
1 1 1 1
2 2 0 1
3 3 0 2
4 4 0 2