Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Spread percentage summary in dataframe pandas

If for example I have one column data frame pandas.

A 20
B 20
C 15
D 10
E 10
F  8 
G  7
H  5
I  5

And I want to get data spread such as then the biggest 75%, 15% and last 10% is

A        F        H     
B        G        I
C        
D
E

Is there pandas function that can make this summary faster ?
Do I need to make index as column name ? because I got the value from df.value_counts() from df dataframe.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

>Solution :

The exact input and expected output is not fully clear, but assuming this DataFrame as input:

   col
A   20
B   20
C   15
D   10
E   10
F    8
G    7
H    5
I    5

You can get a dictionary of the indices using:

import numpy as np

target = [75, 15, 10]

group = pd.cut(df['col'].cumsum(), bins=np.r_[0, np.cumsum(target)], labels=target)

df.index.groupby(group)

output: {75: ['A', 'B', 'C', 'D', 'E'], 15: ['F', 'G'], 10: ['H', 'I']}

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading