I have two dataframes, being the first one:
unit year
0 1 2020
1 2 2021
2 3 2022
and the second:
unit observations
0 1 0
1 2 1
2 2 2
3 2 3
4 2 4
5 3 5
I need to add a column at the first dataframe as a sum of observations for the unit at the second dataframe, I have something like this at the end
unit year observations
0 1 2020 0
1 2 2021 10
2 3 2022 5
I tried to df_1.iterrows and using a query based in the unit from the first df to sum, and it worked, but I’m talking about a df with about to 4 million rows, this solution will take days. Someone have a quicker solution?
>Solution :
Use Series.map
with aggregate sum
in second DataFrame:
df1['observations'] = df1['unit'].map(df2.groupby('unit')['observations'].sum())
print (df1)
unit year observations
0 1 2020 0
1 2 2021 10
2 3 2022 5