I would like to count the number of groups which have any True value across there rows in a column. Here I have some reproducible data:
import pandas as pd
df = {'group': [1, 1, 1, 2, 2, 2, 3, 3], 'condition': ["True", "False", "False", "True", "True", "False", "False", "False"]}
pd.DataFrame(data=df, index=[0, 1, 2, 3, 4, 5, 6, 7])
Output:
group condition
0 1 True
1 1 False
2 1 False
3 2 True
4 2 True
5 2 False
6 3 False
7 3 False
As we can see in group 1 and group 2 there are rows with a True value which means the result should be that there are 2 groups. So the expected outcome should look like this:
groups total_any_true
0 3 2
So I was wondering if anyone knows how to count the total number of groups that have any certain value across their rows?
>Solution :
Given the peculiar format, the best is likely to craft the DataFrame manually:
out = pd.DataFrame([[df['group'].nunique(),
df['condition'].eq('True').groupby(df['group']).any().sum(),
]], columns=['groups', 'total_any_true'])
Output:
groups total_any_true
0 3 2
Alternatively:
out = (df
.assign(condition=df['condition'].eq('True'))
.groupby('group', as_index=False).any()
.agg({'group': 'nunique', 'condition': 'sum'})
.to_frame().T
)
Output:
group condition
0 3 2