Spread percentage summary in dataframe pandas

November 9, 2022

If for example I have one column data frame pandas.

And I want to get data spread such as then the biggest 75%, 15% and last 10% is

A        F        H     
B        G        I
C        
D
E

Is there pandas function that can make this summary faster ?
Do I need to make index as column name ? because I got the value from df.value_counts() from df dataframe.

>Solution :

The exact input and expected output is not fully clear, but assuming this DataFrame as input:

You can get a dictionary of the indices using:

import numpy as np

target = [75, 15, 10]

group = pd.cut(df['col'].cumsum(), bins=np.r_[0, np.cumsum(target)], labels=target)

df.index.groupby(group)

output: {75: ['A', 'B', 'C', 'D', 'E'], 15: ['F', 'G'], 10: ['H', 'I']}