Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Pandas new column based in sum a column from another pandas

I have two dataframes, being the first one:

   unit    year
0     1    2020
1     2    2021
2     3    2022

and the second:

   unit    observations
0     1               0
1     2               1
2     2               2
3     2               3
4     2               4
5     3               5

I need to add a column at the first dataframe as a sum of observations for the unit at the second dataframe, I have something like this at the end

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

   unit    year   observations
0     1    2020              0
1     2    2021             10
2     3    2022              5

I tried to df_1.iterrows and using a query based in the unit from the first df to sum, and it worked, but I’m talking about a df with about to 4 million rows, this solution will take days. Someone have a quicker solution?

>Solution :

Use Series.map with aggregate sum in second DataFrame:

df1['observations'] = df1['unit'].map(df2.groupby('unit')['observations'].sum())
print (df1)
   unit  year  observations
0     1  2020             0
1     2  2021            10
2     3  2022             5
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading