If I have this dataframe :
RoomCode Notes Qty sum cumsum
4302 AGA 2A2K 14323 30613 14323
4301 AGA 2A1K 4810 30613 19133
4303 AGA 2A4K 4180 30613 23313
4306 AGA 2A5K 3759 30613 27072
4307 AGA 2A7K 1472 30613 28544
4304 AGA 2A8K 783 30613 29327
4311 AGA 2A9K 571 30613 29898
4310 AGA 2A10K 243 30613 30141
4312 AGA 2A11K 139 30613 30280
Based on cumsum, I want to make a class which work like this.
if cumsum covers 95% of sum then it is class 1. so, sum is 30613. then 95% of 30613 is 29082.35. So, index 4302 until 4307 is class 1. next 2,5% is between 29082.5 until 29847.675 (range 95% until 97,5% of sum) so,4304 until 4311 is class 2. the next list is class3 .
>Solution :
Lets use pandas cut to categorize the cumsum column based on bins
s = df['sum'].iat[0]
df['class'] = pd.cut(
df['cumsum'], bins=[0, s * .95, s * .975, s],
labels=[1, 2, 3], include_lowest=True
)
if you need to group on Code then apply the categorize func per unique Code group
def categorize(g):
s = g['sum'].iat[0]
return pd.cut(
g['cumsum'],
bins=[0, s * .95, s * .975, s],
labels=[1, 2, 3], include_lowest=True
)
df['class'] = df.groupby('Code', group_keys=False).apply(categorize)
Result
Code Notes Qty sum cumsum class
4302 AGA 2A2K 14323 30613 14323 1
4301 AGA 2A1K 4810 30613 19133 1
4303 AGA 2A4K 4180 30613 23313 1
4306 AGA 2A5K 3759 30613 27072 1
4307 AGA 2A7K 1472 30613 28544 1
4304 AGA 2A8K 783 30613 29327 2
4311 AGA 2A9K 571 30613 29898 3
4310 AGA 2A10K 243 30613 30141 3
4312 AGA 2A11K 139 30613 30280 3