Get the criteria based on cumsum value on pandas dataframe

February 9, 2023

If I have this dataframe :

    RoomCode    Notes   Qty     sum    cumsum
4302    AGA      2A2K   14323   30613   14323
4301    AGA      2A1K   4810    30613   19133
4303    AGA      2A4K   4180    30613   23313
4306    AGA      2A5K   3759    30613   27072
4307    AGA      2A7K   1472    30613   28544
4304    AGA      2A8K    783    30613   29327
4311    AGA      2A9K    571    30613   29898
4310    AGA      2A10K   243    30613   30141
4312    AGA      2A11K   139    30613   30280

Based on cumsum, I want to make a class which work like this.
if cumsum covers 95% of sum then it is class 1. so, sum is 30613. then 95% of 30613 is 29082.35. So, index 4302 until 4307 is class 1. next 2,5% is between 29082.5 until 29847.675 (range 95% until 97,5% of sum) so,4304 until 4311 is class 2. the next list is class3 .

>Solution :

Lets use pandas cut to categorize the cumsum column based on bins

s  = df['sum'].iat[0] 

df['class'] = pd.cut(
    df['cumsum'], bins=[0, s * .95, s * .975, s],
    labels=[1, 2, 3], include_lowest=True
)

if you need to group on Code then apply the categorize func per unique Code group

def categorize(g):
   s  = g['sum'].iat[0] 
   return pd.cut(
        g['cumsum'], 
        bins=[0, s * .95, s * .975, s],
        labels=[1, 2, 3], include_lowest=True
    ) 


df['class'] =  df.groupby('Code', group_keys=False).apply(categorize)

Result

          Code  Notes    Qty    sum  cumsum class
4302      AGA   2A2K  14323  30613   14323     1
4301      AGA   2A1K   4810  30613   19133     1
4303      AGA   2A4K   4180  30613   23313     1
4306      AGA   2A5K   3759  30613   27072     1
4307      AGA   2A7K   1472  30613   28544     1
4304      AGA   2A8K    783  30613   29327     2
4311      AGA   2A9K    571  30613   29898     3
4310      AGA  2A10K    243  30613   30141     3
4312      AGA  2A11K    139  30613   30280     3