I need to filter a data frame with different groups. The data frame looks as follows:
df = pd.DataFrame({"group":[1,1,1,
2,2,2,2,
3,3,3,
4,4],
"percentage":[70,70,70,
45,80,60,70,
71,85,90,
np.nan, np.nan]})
My goal is to return a data frame containing only groups that satisfy one of the two following conditions:
- All observations of the group have percentage > 70
- All observations of the group are np.nan
I know that I have to group the data frame first and then apply the conditions. This might be easily done using a for loop for groups. However, using such a solution might be very slow. Any help would be appreciated.
>Solution :
You can try with filter
df = df.groupby('group').filter(lambda x : x['percentage'].gt(70).all() | x['percentage'].isna().all() )
Out[25]:
group percentage
7 3 71.0
8 3 85.0
9 3 90.0
10 4 NaN
11 4 NaN