Tricky concatenation of header names with column values and pivot in Pandas

November 18, 2021

I have a dataset, df, where I would like to combine the column name with each column value and display the label count.

For example, for id 'aa' in 2022 Q1, there is 1 'hi'  
             for id 'aa' in 2022 Q2, there are 2 'hi' 's

Data

id  type    date    Q1  Q2
aa  hi      2022    1   2
aa  hi      2023    1   1
aa  ok      2022    1   0
bb  hi      2024    3   0

Desired

id  type    date        count
aa  hi      Q1 2022     hi01
aa  ok      Q1 2022     ok01
aa  hi      Q2 2022     hi01
aa  hi      Q2 2022     hi02
aa  hi      Q1 2023     hi01
aa  hi      Q2 2023     hi01
bb  hi      Q1 2024     hi01
bb  hi      Q1 2024     hi02
bb  hi      Q1 2024     hi03

Doing

My approach is to break this apart in steps.
I believe I have to perform a pivot, join and a cumcount:

#create a pivot

df.set_index(['id', 'type']).stack().reset_index()

#set the count

 df['count'] = df['type'] + df.groupby([*df]).cumcount().add(1).astype(str).str.zfill(2)

Any suggestion is appreciated

>Solution :

You could use the following:

(df.melt(id_vars=['id', 'type', 'date'], value_name='count') # reshape data
   .sort_values(by=['date', 'variable'])
   # duplicate the rows according to counts
   .loc[lambda d: d.index.repeat(d['count'])]
   .reset_index(drop=True)
           # merge the quarters and years
   .assign(date=lambda d: d['variable']+' '+d['date'].astype(str),
           # increment the type per group
           count=lambda d: d['type']+d.groupby(['id', 'date', 'type']).cumcount().add(1).astype(str).str.zfill(2)
          )
   # drop now unused column
   .drop(columns='variable')
)

output:

   id type     date count
0  aa   hi  Q1 2022  hi01
1  aa   ok  Q1 2022  ok01
2  aa   hi  Q2 2022  hi01
3  aa   hi  Q2 2022  hi02
4  aa   hi  Q1 2023  hi01
5  aa   hi  Q2 2023  hi01
6  bb   hi  Q1 2024  hi01
7  bb   hi  Q1 2024  hi02
8  bb   hi  Q1 2024  hi03