Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Dynamic top 3 and percentage total using pandas groupby

I have a dataframe like as shown below

id,Name,country,amount,qty
1,ABC,USA,123,4500
1,ABC,USA,156,3210
1,BCE,USA,687,2137
1,DEF,UK,456,1236
1,ABC,nan,216,324
1,DEF,nan,12678,11241
1,nan,nan,637,213
1,BCE,nan,213,543
1,XYZ,KOREA,432,321
1,XYZ,AUS,231,321

sf = pd.read_clipboard(sep=',')

I would like to do the below

a) Get top 3 based on amount for each id and other selected columns such as Name and country. Meaning, we get top 3 based id and Name first and later, we again get top 3 based on id and country

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

b) Find out how much does each of the top 3 item contribute to total amount for each unique id.

So, I tried the below

sf_name = sf.groupby(['id','Name'],dropna=False)['amount'].sum().nlargest(3).reset_index().rename(columns={'amount':'Name_amount'})
sf_country = sf.groupby(['id','country'],dropna=False)['amount'].sum().nlargest(3).reset_index().rename(columns={'amount':'country_amount'})
sf_name['total'] = sf.groupby('id')['amount'].sum()
sf_country['total'] = sf.groupby('id')['amount'].sum()
sf_name['name_pct_total'] = (sf_name['Name_amount']/sf_name['total'])*100
sf_country['country_pct_total'] = (sf_country['country_amount']/sf_country['total'])*100

As you can see, I am repeating the same operation for each column.

But in my real dataframe, I have to do this groupby id and find Top3 and compute pct_total % for another 8 columns (along with Name and country)

Is there any efficient, elegant and scalable solution that you can share?

I expect my output to be like as below

enter image description here

>Solution :

Use:

dfs = []
cols = ['Name','country']

for c in cols:
    df = sf.groupby(['id',c],dropna=False)['amount'].sum().nlargest(3).reset_index(level=1, name=f'{c}_amount')
    df[f'{c}_pct_total'] = df[f'{c}_amount'].div(df.groupby('id')[f'{c}_amount'].transform('sum'))*100
    dfs.append(df)
    
df = pd.concat(dfs, axis=1)
print (df)
   Name  Name_amount  Name_pct_total country  country_amount  \
id                                                             
1   DEF        13134       89.365177     NaN           13744   
1   BCE          900        6.123699     USA             966   
1   XYZ          663        4.511125      UK             456   

    country_pct_total  
id                     
1           90.623764  
1            6.369511  
1            3.006726  
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading