Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Replace values based on column names and other dataframe

Let’s say we have the following "cities" dataframe with cities as column names:

            NY  LA  Rome  London  Milan
date                                   
2023-01-01   1  81    26      55     95
2023-01-02  92  42    96      98      7
2023-01-03  14   4    60      88     73

In another "countries" dataframe I have cities and their countries:

     City Country
0      NY      US
1      LA      US
2  London      UK
3    Rome   Italy
4   Milan   Italy

I want to change values in the "cities" dataframe, replacing the value of each city with the median value of all cities in the same country on the date. Here’s the expected output. For example on 2023-01-01 the value for NY (41) is the median of 1 and 81.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

            NY  LA  Rome  London  Milan
date                                   
2023-01-01  41  41  60.5      55   60.5
2023-01-02  67  67  51.5      98   51.5
2023-01-03   9   9  66.5      88   66.5

I think I need to use groupby but couldn’t make it work. Any help? Thanks.

>Solution :

Lets reconstruct your data

import pandas as pd
cities = pd.DataFrame({'NY': {'2023-01-01': 1, '2023-01-02': 92, '2023-01-03': 14},
 'LA': {'2023-01-01': 81, '2023-01-02': 42, '2023-01-03': 4},
 'Rome': {'2023-01-01': 26, '2023-01-02': 96, '2023-01-03': 60},
 'London': {'2023-01-01': 55, '2023-01-02': 98, '2023-01-03': 88},
 'Milan': {'2023-01-01': 95, '2023-01-02': 7, '2023-01-03': 73}})

cities = cities.rename_axis('date')

countries = pd.DataFrame({'City': ['NY', 'LA', 'London', 'Rome', 'Milan'],
 'Country': ['US', 'US', 'UK', 'Italy', 'Italy']})

Melt the cities df and merge w/countries. Then take the mean and pivot.

df = cities.reset_index().melt(id_vars='date', var_name='City').merge(countries, on='City')
df['mean'] = df.groupby(['date','Country'])['value'].transform('mean')

df = df.pivot_table(index='date', columns='City', values='mean')

print(df)

Output

City          LA  London  Milan    NY  Rome
date                                       
2023-01-01  41.0    55.0   60.5  41.0  60.5
2023-01-02  67.0    98.0   51.5  67.0  51.5
2023-01-03   9.0    88.0   66.5   9.0  66.5
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading