I have two dataframes like as below
proj_df = pd.DataFrame({'reg_id':[1,2,3,4,5,6,7],
'partner': ['ABC_123','ABC_123','ABC_123','ABC_123','ABC_123','ABC_123','ABC_123'],
'part_no':['P123','P123','P123','P123','P123','P123','P123'],
'cust_info':['Apple','Apple','Apple','Apple','Apple','Apple','Tesla'],
'qty_1st_year':[100,100,600,150,50,0,10]})
order_df = pd.DataFrame({'partner': ['ABC_123','ABC_123','JKL_123','MNO_123'],
'part_no':['P123','P123','Q123','P567'],
'cust_info':['Apple','Hyundai','REON','Renault'],
'order_qty':[1000,600,50,0]})
I would like to do the below
a) Merge two dataframes based on partner,part_no,cust_info
b) split the order_qty column from order_df and assign the appropriate portion to a new column called assigned_qty
c) appropriate portion is determined by the percentage distribution of qty_1st_year. Meaning, you divide individual qty_1st_year value by the total sum of Qty_1st_year for each group of partner,part_no and cust_info.
So, I tried the below
sum_df = proj_df.groupby(['partner','part_no','cust_info'])['qty_1st_year'].sum().reset_index()
sum_df.columns = ['partner','part_no','cust_info','total_qty_all_project']
t1=proj_df.merge(order_df,on=['partner','part_no','cust_info'],how='left')
t2 = t1.merge(sum_df,on=['partner','part_no','cust_info'],how='left')
t2['pct_value'] = (t2['qty_1st_year']/t2['total_qty_all_project'])*100
proj_df['assigned_value'] = (t2['order_qty']*t2['pct_value'])/100
While this seems to work fine, I would like to know is there any other better and elegant way to do this task.
I expect my output to be like as below
>Solution :
Use GroupBy.transform with sum for new column for avoid double merge, last if need remove columns after multiplication and division add DataFrame.pop, last if necessary divide by 10 for expected ouput:
proj_df['total_qty'] = (proj_df.groupby(['partner','part_no','cust_info'])['qty_1st_year']
.transform('sum'))
df=proj_df.merge(order_df,on=['partner','part_no','cust_info'],how='left')
df['assigned_value'] = (df.pop('order_qty')*
(df['qty_1st_year']/
df.pop('total_qty'))).div(10)
print (df)
reg_id partner part_no cust_info qty_1st_year assigned_value
0 1 ABC_123 P123 Apple 100 10.0
1 2 ABC_123 P123 Apple 100 10.0
2 3 ABC_123 P123 Apple 600 60.0
3 4 ABC_123 P123 Apple 150 15.0
4 5 ABC_123 P123 Apple 50 5.0
5 6 ABC_123 P123 Apple 0 0.0
6 7 ABC_123 P123 Tesla 10 NaN
