Succinct / pythonic method of groupby percentage calculation

I want to groupby and calculate the percentage within each respective group over a certain value. I have achieved this by two groupbys into temporary dataframes (group count over a value, and group count), then merging the dataframes before calculating the percentage. I assume there is a more succinct method that doesn’t require creating temporary dataframes.

e.g.

``````df = pd.DataFrame({'n': ['a', 'b', 'c', 'a', 'c', 'a', 'a', 'c', 'a'],
'm': ['j', 'j', 'j', 'j', 'j', 'j', 'f', 'f', 'f'],
'v': [1, 1, 1, 2, 1, 2, 3, 6, 5]})
``````
1. Count groupby on ‘n’ and ‘m’ and count
``````dfcount = df.groupby(['n', 'm'])['v'].agg('count').reset_index().rename(columns={'v': 'count'})
``````
1. Filter values over 1, then groupby ‘n’ and ‘m’ and count
``````dfover1 = df[df['v'] > 1].groupby(['n','m']).count().reset_index().rename(columns={'v': 'over1count'})
``````
1. Merge two new dataframes on the groupby columns ‘n’ and ‘m’, then calculate the percentage over 1 for those groupbys.
``````dfnew = dfcount.merge(dfover1, on=['n', 'm'])
dfnew['late%'] = (100 * dfnew['over1count'] / dfnew['count']).round(1)
``````
n m count latecount late%
0 a f 2 2 100.0
1 a j 3 2 66.7
2 c f 1 1 100.0

>Solution :

You can use method chaining (pipeline) to avoid using named intermediates:

``````out = (df
.groupby(['n', 'm'], as_index=False)
# compute count and latecount
.agg(**{'count': ('v', 'count'),
'latecount': ('v', lambda v: len(v[v>1]))
})
# filter rows with latecount > 0
.query('latecount > 0')
# compute late%
.assign(**{'late%': lambda d: d['latecount'].div(d['count']).mul(100).round(2)})
)
``````

output:

``````   n  m  count  latecount   late%
0  a  f      2          2  100.00
1  a  j      3          2   66.67
3  c  f      1          1  100.00
``````