Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Join two dataframes on multiple conditions in python

I have the following problem: i am trying to join df1 = ['ID, 'Earnings', 'WC, 'Year'] and df2 = ['ID', 'F1_Earnings', 'df2_year']. So for example: the ‘F1_Earnings’ of a particular company, e.g. with ID = 1 and year = 1996, in df2 (aka. the Forward Earnings) should get joined on df1 in a way that they show up in df1 under ID = 1 and year = 1995.
I have no clue how to specify a join on two conditions, of course they need to join on "ID", but how do I add a second condition which specifies that they also join on "df1_year = df2_year – 1"?

d1 = {'ID': [1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4], 'Earnings': [100, 200, 400, 250, 300, 350, 400, 550, 700, 259, 300, 350], 'WC': [20, 40, 35, 55, 60, 65, 30, 28, 32, 45, 60, 52], 'Year': [1995, 1996, 1997, 1996, 1997, 1998, 1995, 1997, 1998, 1996, 1997, 1998]}

df1 = pd.DataFrame(data=d1)

d2 = {'ID': [1, 2, 3, 4], 'F1_Earnings': [120, 220, 420, 280], 'WC': [23, 37, 40, 52], 'Year': [1996, 1997, 1998, 1999]}

df2 = pd.DataFrame(data=d2)

I did the following, but I guess there miust be a smarter way? I am afraid it wont work for larger datasets…:

    df3 = pd.merge(df1, df2, how='left', on = 'ID')
    df3.loc[df3['Year_x'] == df3['Year_y'] - 1]

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

>Solution :

You can use a Series as key in merge:

df1.merge(df2, how='left',
          left_on=['ID', 'Year'],
          right_on=['ID', df2['Year'].sub(1)])

output:

    ID  Year  Earnings  WC_x  Year_x  F1_Earnings  WC_y  Year_y
0    1  1995       100    20    1995        120.0  23.0  1996.0
1    1  1996       200    40    1996          NaN   NaN     NaN
2    1  1997       400    35    1997          NaN   NaN     NaN
3    2  1996       250    55    1996        220.0  37.0  1997.0
4    2  1997       300    60    1997          NaN   NaN     NaN
5    2  1998       350    65    1998          NaN   NaN     NaN
6    3  1995       400    30    1995          NaN   NaN     NaN
7    3  1997       550    28    1997        420.0  40.0  1998.0
8    3  1998       700    32    1998          NaN   NaN     NaN
9    4  1996       259    45    1996          NaN   NaN     NaN
10   4  1997       300    60    1997          NaN   NaN     NaN
11   4  1998       350    52    1998        280.0  52.0  1999.0

Or change the Year to Year-1, before the merge:

df1.merge(df2.assign(Year=df2['Year'].sub(1)),
          how='left', on=['ID', 'Year'])

output:

    ID  Earnings  WC_x  Year  F1_Earnings  WC_y
0    1       100    20  1995        120.0  23.0
1    1       200    40  1996          NaN   NaN
2    1       400    35  1997          NaN   NaN
3    2       250    55  1996        220.0  37.0
4    2       300    60  1997          NaN   NaN
5    2       350    65  1998          NaN   NaN
6    3       400    30  1995          NaN   NaN
7    3       550    28  1997        420.0  40.0
8    3       700    32  1998          NaN   NaN
9    4       259    45  1996          NaN   NaN
10   4       300    60  1997          NaN   NaN
11   4       350    52  1998        280.0  52.0
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading