I have the following example dataframe:
data = {'ref':['1', '2', '3', '4', '5'],
'checked':['True', 'True', 'True', 'False', 'True'],
'rag':['r', 'r', 'g', 'a', 'r'],
'group':['high', 'low', 'high', 'medium', 'high']}
dataframe = pd.DataFrame(data)
I want to group on group and do some conditional counts where certain conditions are met so I get the following:
I can group by group and do n by the following:
df = dataframe.groupby(['group']).agg(
n=('ref', 'count')
).reset_index()
But I am struggling to also count the number of times for each group that:
- checked = True
- rag = g
- rag = a
- rag = r
Any help would be much appreciated!
>Solution :
You have a few challenges.
For instance, your True/False are strings, so you should either initialize them as booleans and use sum or convert to boolean during aggregation.
To count the rag, it’s easier to use pandas.crosstab and join it while you still have the groups as index.
df = (dataframe
.groupby(['group'])
.agg(**{'n': ('ref', 'count'),
'checked=True': ('checked', lambda s: s.eq('True').sum()),
})
.join(pd.crosstab(dataframe['group'], dataframe['rag'])
.add_prefix('rag=')
)
.reset_index()
)
output:
group n checked=True rag=a rag=g rag=r
0 high 3 3 0 1 2
1 low 1 1 0 0 1
2 medium 1 0 1 0 0

