unique melt and pivot crosstab format with a groupby using pandas

February 1, 2024

I have a dataset where I would like for some values to become column headers, as well as a crosstab format.

data

year    qtr     ID  type    growth  re     nondd_re   se_re or
2024    2024Q1  NY  aa      3.18    1.14    0         0     0
2024    2024Q2  NY  aa      2.1     1.14    0         0     0
2024    2024Q1  NY  dd      6.26    3.07    3.07      0     0
2024    2024Q2  NY  dd      4.13    3.07    3.07      0     0
2024    2024Q1  CA  aa      0       0       0         0     0
2024    2024Q2  CA  aa      0.03    0       0         0     0
2024    2024Q1  CA  dd      0       0       0         0     0
2024    2024Q2  CA  dd      0.06    0       0         0     0

desired

ID  type       type 2024Q1  2024Q2
NY  growth     dd   6.26    4.13
NY  nond_ re   dd   3.07    3.07
NY  se_re      dd   0       0
NY  or         dd   0       0
NY  re         dd   3.07    3.07
NY  growth     aa   3.18    2.1
NY  nond_ re   aa   0       0
NY  se_re      aa   0       0
NY  or         aa   0       0
NY  re         aa   1.14    1.14
CA  growth     dd   0       0.6
CA  nond_ re   dd   0       0
CA  se_re      dd   0       0
CA  or         dd   0       0
CA  re         dd   0       0
CA  growth     aa   0       0.3
CA  nond_ re   aa   0       0
CA  se_re      aa   0       0
CA  or         aa   0       0
CA  re         aa   0       0

doing

Melt the dataframe to transform metrics columns into rows

melted_df = df.melt(id_vars=["year", "qtr", "ID", "type"], var_name="type", value_name="value")

Pivot the melted dataframe

pivot_df = melted_df.pivot_table(index=["ID","type"], columns="qtr", values="value", fill_value=0)

Reset index to turn multi-index into columns

pivot_df = pivot_df.reset_index()

The issue is that all of the values aren’t getting. The above code produces output with missing values
Any suggestion is appreciated

>Solution :

You can try to set_index() + stack()/unstack():

out = (
    df.set_index(["year", "qtr", "ID", "type"])
    .stack()
    .unstack("qtr")
    .reset_index()
    .rename(columns={"level_3": "type2"})
    .rename_axis(columns=None, index=None)
)

print(out)

Prints:

    year  ID type     type2  2024Q1  2024Q2
0   2024  CA   aa    growth    0.00    0.03
1   2024  CA   aa        re    0.00    0.00
2   2024  CA   aa  nondd_re    0.00    0.00
3   2024  CA   aa     se_re    0.00    0.00
4   2024  CA   aa        or    0.00    0.00
5   2024  CA   dd    growth    0.00    0.06
6   2024  CA   dd        re    0.00    0.00
7   2024  CA   dd  nondd_re    0.00    0.00
8   2024  CA   dd     se_re    0.00    0.00
9   2024  CA   dd        or    0.00    0.00
10  2024  NY   aa    growth    3.18    2.10
11  2024  NY   aa        re    1.14    1.14
12  2024  NY   aa  nondd_re    0.00    0.00
13  2024  NY   aa     se_re    0.00    0.00
14  2024  NY   aa        or    0.00    0.00
15  2024  NY   dd    growth    6.26    4.13
16  2024  NY   dd        re    3.07    3.07
17  2024  NY   dd  nondd_re    3.07    3.07
18  2024  NY   dd     se_re    0.00    0.00
19  2024  NY   dd        or    0.00    0.00