How to average across two dataframes

I have two dataframes:

{'id': {4: 1548638, 6: 1953603, 7: 1956216, 8: 1962245, 9: 1981386, 10: 1981773, 11: 2004787, 13: 2017418, 14: 2020989, 15: 2045043}, 'total': {4: 17, 6: 38, 7: 59, 8: 40, 9: 40, 10: 40, 11: 80, 13: 44, 14: 51, 15: 46}}
{'id': {4: 1548638, 6: 1953603, 7: 1956216, 8: 1962245, 9: 1981386, 10: 1981773, 11: 2004787, 13: 2017418, 14: 2020989, 15: 2045043}, 'total': {4: 17, 6: 38, 7: 59, 8: 40, 9: 40, 10: 40, 11: 80, 13: 44, 14: 51, 15: 46}}

For every ‘id’ that exists in both dataframes I would like to compute the average of their values in ‘total’ and have that in a new dataframe.

I tried:

 pd.merge(df1, df2, on="id")

with the hope that I could then do:

merged_df[['total']].mean(axis=1)

but it doesn’t work at all.

How can you do this?

>Solution :

You could use:

df1.merge(df2, on='id').set_index('id').mean(axis=1).reset_index(name='total')

Or, if you have many columns, a more generic approach:

(df1.merge(df2, on='id', suffixes=(None, '_other')).set_index('id')
    .rename(columns=lambda x: x.removesuffix('_other')) # requires python 3.9+
    .groupby(axis=1, level=0)
    .mean().reset_index()
)

Output:

        id  total
0  1548638   17.0
1  1953603   38.0
2  1956216   59.0
3  1962245   40.0
4  1981386   40.0
5  1981773   40.0
6  2004787   80.0
7  2017418   44.0
8  2020989   51.0
9  2045043   46.0

Leave a Reply