Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Sum two columns in a grouped data frame using shift()

I have a data frame df where I would like to create new column ID which is a diagonal combination of two other columns ID1 & ID2.

This is the data frame:

import pandas as pd

df = pd.DataFrame({'Employee':[5,5,5,20,20],
                   'Department':[4,4,4,6,6],
                   'ID':['AB','CD','EF','XY','AA'],
                   'ID2':['CD','EF','GH','AA','ZW']},)  

This is how the initial data frame looks like:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

  Employee  Department  ID1 ID2  
0         5           4  AB  CD   
1         5           4  CD  EF   
2         5           4  EF  GH       
3        20           6  XY  AA   
4        20           6  AA  ZW 

If I group df by Employee & Department:

df2=df.groupby(["Employee","Department"])

I would have only two option of groups, groups containing two rows or groups containing three rows.

The column ID would be the sum of ID1 of the first row & ID2 of the next row & for the last row of the group, ID would take the value of the previous ID.

Expected output:

        Employee  Department ID1 ID2 ID  
    0         5           4  AB  CD  ABEF 
    1         5           4  CD  EF  CDGH 
    2         5           4  EF  GH  CDGH     
    3        20           6  XY  AA  XYZW 
    4        20           6  AA  ZW  XYZW

I thought about using shift()

df2["ID"]=df["ID1"]+df["ID2"].shift(-1)

But I could not quite figure it out. Any ideas ?

>Solution :

(df["ID1"] + df.groupby(["Employee", "Department"])["ID2"].shift(-1)).ffill()

almost your code, but we first groupby and then shift up. Lastly forward fill for those last rows per group.

In [24]: df
Out[24]:
   Employee  Department ID1 ID2
0         5           4  AB  CD
1         5           4  CD  EF
2         5           4  EF  GH
3        20           6  XY  AA
4        20           6  AA  ZW

In [25]: df["ID"] = (df["ID1"] + df.groupby(["Employee", "Department"])["ID2"].shift(-1)).ffill()

In [26]: df
Out[26]:
   Employee  Department ID1 ID2    ID
0         5           4  AB  CD  ABEF
1         5           4  CD  EF  CDGH
2         5           4  EF  GH  CDGH
3        20           6  XY  AA  XYZW
4        20           6  AA  ZW  XYZW
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading