I’ll preface this by saying I’d consider myself inexperienced in both python and coding in general.
I have a dataframe that I want groupby the site and then move the data into a new dataframe that uses the year as the column headings, I can’t think what this would be called, its not really a transpose as there can be years missing (unless you can somehow transpose and match column names to df[‘year’])
df =
| site | year | data |
|---|---|---|
| a | 2010 | 10 |
| a | 2011 | 12 |
| a | 2012 | 7 |
| a | 2013 | 45 |
| b | 2010 | 21 |
| b | 2012 | 45 |
| b | 2013 | 54 |
| c | 2010 | 17 |
| c | 2011 | 32 |
| c | 2012 | 24 |
What I want to end up with is something like this
| site | 2010 | 2011 | 2012 | 2013 |
|---|---|---|---|---|
| a | 10 | 12 | 7 | 45 |
| b | 21 | 45 | 54 | |
| c | 17 | 32 | 24 |
This gets me some of the way there, but will not work where there is missing data – I’m not sure how to proceeed and incorporate the taking the years and creating the new columns.
df.groupby(‘site’)[‘data’].apply(lambda df: df.reset_index(drop=True)).unstack()
I feel it shouldn’t be difficult, and that I just can’t come up with the right keywords to search for the answer.
>Solution :
import pandas as pd
df = pd.DataFrame({
"site": ["a", "a", "a", "a", "b", "b", "b", "c", "c", "c"],
"year": [2010, 2011, 2012, 2013, 2010, 2012, 2013, 2010, 2011, 2012],
"data": [10, 12, 7, 45, 21, 45, 54, 17, 32, 24]})
df = df.set_index(["site", "year"])
new_df = df.pivot_table(values="data", index="site", columns="year",
fill_value=0)
new_df = new_df.sort_index(axis=1)
print(new_df)
This should help you.