I have a dataframe with three median rent variables. The dataframe looks like this:
| region_id | year | 1bed_med_rent | 2bed_med_rent | 3bed_med_rent |
|---|---|---|---|---|
| 1 | 2010 | 800 | 1000 | 1200 |
| 1 | 2011 | 850 | 1050 | 1250 |
| 2 | 2010 | 900 | 1000 | 1100 |
| 2 | 2011 | 950 | 1050 | 1150 |
I would like to combine all rent variables into one variable using common elements of region and year like so:
| region_id | year | med_rent |
|---|---|---|
| 1 | 2010 | 1000 |
| 1 | 2011 | 1050 |
| 2 | 2010 | 1000 |
| 2 | 2011 | 1050 |
Using the agg() function in pandas, I have been able to perform functions on multiple variables, but I have not been able to combine variables and insert into the dataframe. I have attempted to use the assign() function in combination with the below code without success.
#Creating the group list of common IDs
group_list = ['region_id', 'year']
#Grouping by common ID and taking median values of each group
new_df = df.groupby(group_list).agg({'1bed_med_rent': ['median'],'2bed_med_rent':
['median'], '3bed_med_rent': ['median']}).reset_index()
What other method might there be for this?
>Solution :
Here set_index combined with apply applied to the rest of the row ought to do it:
(df.set_index(['region_id','year'])
.apply(lambda r:r.median(), axis=1)
.reset_index()
.rename(columns = {0:'med_rent'})
)
produces
region_id year med_rent
0 1 2010 1000.0
1 1 2011 1050.0
2 2 2010 1000.0
3 2 2011 1050.0