Pandas new column based in sum a column from another pandas

I have two dataframes, being the first one:

   unit    year
0     1    2020
1     2    2021
2     3    2022

and the second:

   unit    observations
0     1               0
1     2               1
2     2               2
3     2               3
4     2               4
5     3               5

I need to add a column at the first dataframe as a sum of observations for the unit at the second dataframe, I have something like this at the end

   unit    year   observations
0     1    2020              0
1     2    2021             10
2     3    2022              5

I tried to df_1.iterrows and using a query based in the unit from the first df to sum, and it worked, but I’m talking about a df with about to 4 million rows, this solution will take days. Someone have a quicker solution?

>Solution :

Use Series.map with aggregate sum in second DataFrame:

df1['observations'] = df1['unit'].map(df2.groupby('unit')['observations'].sum())
print (df1)
   unit  year  observations
0     1  2020             0
1     2  2021            10
2     3  2022             5

Leave a Reply