Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Conditional wide to long format in Pandas

I have the following example dataset in wide format without any name for the date and value column

RIC Company 31/12/2021  31/12/2020  31/12/2019
ABC ABC ltd     0.77      0.75       0.66
XYZ XYZ ltd 7846.43       na         6607.29
CDF CDF ltd 191.14        127.74     63.92

I need to convert it to long format as the following

RIC Company     Date    Value
ABC ABC ltd 31/12/2021  0.77
ABC ABC ltd 31/12/2020  0.75
ABC ABC ltd 31/12/2019  0.66
XYZ XYZ ltd 31/12/2021  7846.43
XYZ XYZ ltd 31/12/2020  na
XYZ XYZ ltd 31/12/2019  6607.29
CDF CDF ltd 31/12/2021  191.14
CDF CDF ltd 31/12/2020  127.74
CDF CDF ltd 31/12/2019  63.92

I can’t seem to get around the use of pd.melt /pd.wide_to_long/pyjanitor for the date section.
Some sample code that I tried (using pyjanitor) but I get errors. Any help is appreciated. Thanks.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

(df
.pivot_longer(
    index = ['RIC','Company'], 
    names_to = ('.value', 'Date'), 
    sort_by_appearance=True)
)

>Solution :

You can just melt the dataframe providing the indices, and the variable columns along with the names for variable and values columns:

df.melt(['RIC', 'Company'],
        ['31/12/2021', '31/12/2020', '31/12/2019'],
        'Date',
        'Value')

OUTPUT:


   RIC  Company        Date    Value
0  ABC  ABC ltd  31/12/2021     0.77
1  XYZ  XYZ ltd  31/12/2021  7846.43
2  CDF  CDF ltd  31/12/2021   191.14
3  ABC  ABC ltd  31/12/2020     0.75
4  XYZ  XYZ ltd  31/12/2020       na
5  CDF  CDF ltd  31/12/2020   127.74
6  ABC  ABC ltd  31/12/2019     0.66
7  XYZ  XYZ ltd  31/12/2019  6607.29
8  CDF  CDF ltd  31/12/2019    63.92

If you don’t want to manually type all the columns, you can just exclude the columns you don’t want:

df.melt(['RIC', 'Company'],
        [col for col in df if col not in {'RIC', 'Comapny'}],
        'Date',
        'Value')
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading