Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Pandas – split one row value and merge with multiple rows

I have two dataframes like as below

proj_df = pd.DataFrame({'reg_id':[1,2,3,4,5,6,7],
                              'partner': ['ABC_123','ABC_123','ABC_123','ABC_123','ABC_123','ABC_123','ABC_123'],
                              'part_no':['P123','P123','P123','P123','P123','P123','P123'],
                              'cust_info':['Apple','Apple','Apple','Apple','Apple','Apple','Tesla'],
                              'qty_1st_year':[100,100,600,150,50,0,10]})

order_df = pd.DataFrame({'partner': ['ABC_123','ABC_123','JKL_123','MNO_123'],
                         'part_no':['P123','P123','Q123','P567'],
                         'cust_info':['Apple','Hyundai','REON','Renault'],
                         'order_qty':[1000,600,50,0]})

I would like to do the below

a) Merge two dataframes based on partner,part_no,cust_info

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

b) split the order_qty column from order_df and assign the appropriate portion to a new column called assigned_qty

c) appropriate portion is determined by the percentage distribution of qty_1st_year. Meaning, you divide individual qty_1st_year value by the total sum of Qty_1st_year for each group of partner,part_no and cust_info.

So, I tried the below

sum_df = proj_df.groupby(['partner','part_no','cust_info'])['qty_1st_year'].sum().reset_index()
sum_df.columns = ['partner','part_no','cust_info','total_qty_all_project']

t1=proj_df.merge(order_df,on=['partner','part_no','cust_info'],how='left')
t2 = t1.merge(sum_df,on=['partner','part_no','cust_info'],how='left')
t2['pct_value'] = (t2['qty_1st_year']/t2['total_qty_all_project'])*100
proj_df['assigned_value'] = (t2['order_qty']*t2['pct_value'])/100

While this seems to work fine, I would like to know is there any other better and elegant way to do this task.

I expect my output to be like as below

enter image description here

>Solution :

Use GroupBy.transform with sum for new column for avoid double merge, last if need remove columns after multiplication and division add DataFrame.pop, last if necessary divide by 10 for expected ouput:

proj_df['total_qty'] = (proj_df.groupby(['partner','part_no','cust_info'])['qty_1st_year']
                               .transform('sum'))

df=proj_df.merge(order_df,on=['partner','part_no','cust_info'],how='left')

df['assigned_value'] = (df.pop('order_qty')*
                       (df['qty_1st_year']/
                       df.pop('total_qty'))).div(10)
print (df)
   reg_id  partner part_no cust_info  qty_1st_year  assigned_value
0       1  ABC_123    P123     Apple           100            10.0
1       2  ABC_123    P123     Apple           100            10.0
2       3  ABC_123    P123     Apple           600            60.0
3       4  ABC_123    P123     Apple           150            15.0
4       5  ABC_123    P123     Apple            50             5.0
5       6  ABC_123    P123     Apple             0             0.0
6       7  ABC_123    P123     Tesla            10             NaN
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading