I have a dataset where I would like for some values to become column headers, as well as a crosstab format.
data
year qtr ID type growth re nondd_re se_re or
2024 2024Q1 NY aa 3.18 1.14 0 0 0
2024 2024Q2 NY aa 2.1 1.14 0 0 0
2024 2024Q1 NY dd 6.26 3.07 3.07 0 0
2024 2024Q2 NY dd 4.13 3.07 3.07 0 0
2024 2024Q1 CA aa 0 0 0 0 0
2024 2024Q2 CA aa 0.03 0 0 0 0
2024 2024Q1 CA dd 0 0 0 0 0
2024 2024Q2 CA dd 0.06 0 0 0 0
desired
ID type type 2024Q1 2024Q2
NY growth dd 6.26 4.13
NY nond_ re dd 3.07 3.07
NY se_re dd 0 0
NY or dd 0 0
NY re dd 3.07 3.07
NY growth aa 3.18 2.1
NY nond_ re aa 0 0
NY se_re aa 0 0
NY or aa 0 0
NY re aa 1.14 1.14
CA growth dd 0 0.6
CA nond_ re dd 0 0
CA se_re dd 0 0
CA or dd 0 0
CA re dd 0 0
CA growth aa 0 0.3
CA nond_ re aa 0 0
CA se_re aa 0 0
CA or aa 0 0
CA re aa 0 0
doing
Melt the dataframe to transform metrics columns into rows
melted_df = df.melt(id_vars=["year", "qtr", "ID", "type"], var_name="type", value_name="value")
Pivot the melted dataframe
pivot_df = melted_df.pivot_table(index=["ID","type"], columns="qtr", values="value", fill_value=0)
Reset index to turn multi-index into columns
pivot_df = pivot_df.reset_index()
The issue is that all of the values aren’t getting. The above code produces output with missing values
Any suggestion is appreciated
>Solution :
You can try to set_index() + stack()/unstack():
out = (
df.set_index(["year", "qtr", "ID", "type"])
.stack()
.unstack("qtr")
.reset_index()
.rename(columns={"level_3": "type2"})
.rename_axis(columns=None, index=None)
)
print(out)
Prints:
year ID type type2 2024Q1 2024Q2
0 2024 CA aa growth 0.00 0.03
1 2024 CA aa re 0.00 0.00
2 2024 CA aa nondd_re 0.00 0.00
3 2024 CA aa se_re 0.00 0.00
4 2024 CA aa or 0.00 0.00
5 2024 CA dd growth 0.00 0.06
6 2024 CA dd re 0.00 0.00
7 2024 CA dd nondd_re 0.00 0.00
8 2024 CA dd se_re 0.00 0.00
9 2024 CA dd or 0.00 0.00
10 2024 NY aa growth 3.18 2.10
11 2024 NY aa re 1.14 1.14
12 2024 NY aa nondd_re 0.00 0.00
13 2024 NY aa se_re 0.00 0.00
14 2024 NY aa or 0.00 0.00
15 2024 NY dd growth 6.26 4.13
16 2024 NY dd re 3.07 3.07
17 2024 NY dd nondd_re 3.07 3.07
18 2024 NY dd se_re 0.00 0.00
19 2024 NY dd or 0.00 0.00