Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Tricky concatenation of header names with column values and pivot in Pandas

I have a dataset, df, where I would like to combine the column name with each column value and display the label count.

For example, for id 'aa' in 2022 Q1, there is 1 'hi'  
             for id 'aa' in 2022 Q2, there are 2 'hi' 's

Data

id  type    date    Q1  Q2
aa  hi      2022    1   2
aa  hi      2023    1   1
aa  ok      2022    1   0
bb  hi      2024    3   0

Desired

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

id  type    date        count
aa  hi      Q1 2022     hi01
aa  ok      Q1 2022     ok01
aa  hi      Q2 2022     hi01
aa  hi      Q2 2022     hi02
aa  hi      Q1 2023     hi01
aa  hi      Q2 2023     hi01
bb  hi      Q1 2024     hi01
bb  hi      Q1 2024     hi02
bb  hi      Q1 2024     hi03

            

Doing

My approach is to break this apart in steps.
I believe I have to perform a pivot, join and a cumcount:

#create a pivot

df.set_index(['id', 'type']).stack().reset_index()

#set the count

 df['count'] = df['type'] + df.groupby([*df]).cumcount().add(1).astype(str).str.zfill(2)

Any suggestion is appreciated

>Solution :

You could use the following:

(df.melt(id_vars=['id', 'type', 'date'], value_name='count') # reshape data
   .sort_values(by=['date', 'variable'])
   # duplicate the rows according to counts
   .loc[lambda d: d.index.repeat(d['count'])]
   .reset_index(drop=True)
           # merge the quarters and years
   .assign(date=lambda d: d['variable']+' '+d['date'].astype(str),
           # increment the type per group
           count=lambda d: d['type']+d.groupby(['id', 'date', 'type']).cumcount().add(1).astype(str).str.zfill(2)
          )
   # drop now unused column
   .drop(columns='variable')
)

output:

   id type     date count
0  aa   hi  Q1 2022  hi01
1  aa   ok  Q1 2022  ok01
2  aa   hi  Q2 2022  hi01
3  aa   hi  Q2 2022  hi02
4  aa   hi  Q1 2023  hi01
5  aa   hi  Q2 2023  hi01
6  bb   hi  Q1 2024  hi01
7  bb   hi  Q1 2024  hi02
8  bb   hi  Q1 2024  hi03
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading