Dynamic top 3 and percentage total using pandas groupby

May 26, 2022

I have a dataframe like as shown below

id,Name,country,amount,qty
1,ABC,USA,123,4500
1,ABC,USA,156,3210
1,BCE,USA,687,2137
1,DEF,UK,456,1236
1,ABC,nan,216,324
1,DEF,nan,12678,11241
1,nan,nan,637,213
1,BCE,nan,213,543
1,XYZ,KOREA,432,321
1,XYZ,AUS,231,321

sf = pd.read_clipboard(sep=',')

I would like to do the below

a) Get top 3 based on amount for each id and other selected columns such as Name and country. Meaning, we get top 3 based id and Name first and later, we again get top 3 based on id and country

b) Find out how much does each of the top 3 item contribute to total amount for each unique id.

So, I tried the below

sf_name = sf.groupby(['id','Name'],dropna=False)['amount'].sum().nlargest(3).reset_index().rename(columns={'amount':'Name_amount'})
sf_country = sf.groupby(['id','country'],dropna=False)['amount'].sum().nlargest(3).reset_index().rename(columns={'amount':'country_amount'})
sf_name['total'] = sf.groupby('id')['amount'].sum()
sf_country['total'] = sf.groupby('id')['amount'].sum()
sf_name['name_pct_total'] = (sf_name['Name_amount']/sf_name['total'])*100
sf_country['country_pct_total'] = (sf_country['country_amount']/sf_country['total'])*100

As you can see, I am repeating the same operation for each column.

But in my real dataframe, I have to do this groupby id and find Top3 and compute pct_total % for another 8 columns (along with Name and country)

Is there any efficient, elegant and scalable solution that you can share?

I expect my output to be like as below

>Solution :

Use:

dfs = []
cols = ['Name','country']

for c in cols:
    df = sf.groupby(['id',c],dropna=False)['amount'].sum().nlargest(3).reset_index(level=1, name=f'{c}_amount')
    df[f'{c}_pct_total'] = df[f'{c}_amount'].div(df.groupby('id')[f'{c}_amount'].transform('sum'))*100
    dfs.append(df)
    
df = pd.concat(dfs, axis=1)
print (df)
   Name  Name_amount  Name_pct_total country  country_amount  \
id                                                             
1   DEF        13134       89.365177     NaN           13744   
1   BCE          900        6.123699     USA             966   
1   XYZ          663        4.511125      UK             456   

    country_pct_total  
id                     
1           90.623764  
1            6.369511  
1            3.006726