Let’s say we have the following "df1" dataframe with cities as column names:
NY LA Rome London Milan
date
2023-01-01 1 81 26 55 95
2023-01-02 92 42 96 98 7
2023-01-03 14 4 60 88 73
In another "df2" dataframe I have cities and their countries:
City Country
0 NY US
1 LA US
2 London UK
3 Rome Italy
4 Milan Italy
In a third "df3" dataframe I have some values for each country and each date:
US UK Italy
date
2023-01-01 70 41 32
2023-01-02 98 46 45
2023-01-03 83 50 17
My output dataframe has the same strutcture as the first dataframe. This is the expected output:
NY LA Rome London Milan
date
2023-01-01 -69 11 -6 14 63
2023-01-02 -6 -56 51 52 -38
2023-01-03 -69 -79 43 38 56
For example, the 51 value for "Rome" on 2023-01-02 is the difference between the value of the same cell from df1 (96) and the value of the country where Rome is located (Italy) on 2023-01-02 (45).
Any help? Thanks.
>Solution :
Try stack df3, merge it with df2 to get the correct data for the cities, then you can sub:
df1.sub(df3.stack().reset_index(name='value')
.merge(df2, left_on='level_1', right_on='Country')
.pivot(index='date', columns='City', values='value')
)
Output:
LA London Milan NY Rome
date
2023-01-01 11 14 63 -69 -6
2023-01-02 -56 52 -38 -6 51
2023-01-03 -79 38 56 -69 43