Using pandas, I want to filter out all groups that contain only zero values
So in pseudo-code something like this
df.groupby('my_group')['values'].filter(all(iszero))
Example input dataframe could be something like this
df = pd.DataFrame({'my_group': ['A', 'B', 'C', 'D']*3, 'values': [0 if (x % 4 == 0 or x == 11) else random.random() for x in range(12)]})
my_group values
0 A 0.000000
1 B 0.286104
2 C 0.359804
3 D 0.596152
4 A 0.000000
5 B 0.560742
6 C 0.534575
7 D 0.251302
8 A 0.000000
9 B 0.445010
10 C 0.750434
11 D 0.000000
Here, group A contains all zero values, so it should be filtered out. Group D also has a zero value in row 11, but in other rows it has non-zero values, so it shouldn’t be filtered out
>Solution :
Here are possible solution from the best to worse performance:
#filtere groups by != 0 and then filter again original column by mask
df1 = df[df['my_group'].isin(df.loc[df['values'].ne(0), 'my_group'])]
#create mask by groupy.transform
df1 = df[df['values'].ne(0).groupby(df['my_group']).transform('any')]
#filtered by lambda function (if large data it is slow)
df1 = df.groupby('my_group').filter(lambda x: x['values'].ne(0).any())
print (df1)
my_group values
1 B 0.286104
2 C 0.359804
3 D 0.596152
5 B 0.560742
6 C 0.534575
7 D 0.251302
9 B 0.445010
10 C 0.750434
11 D 0.000000