Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Pivot a column so repeated values/records are placed in 1 cell

I have the following

Input:
samples = [('001', 'RENAL', 'CHROMOPHOBE', 'KICH'),
         ('002', 'OVARIAN', 'HIGH_GRADE_SEROUS_CARCINOMA', 'LGSOC'),
         ('003', 'OVARIAN', 'OTHER', 'NaN'),
         ('001', 'COLORECTAL', 'ADENOCARCINOMA', 'KICH')]
labels = ['id', 'disease_type', 'disease_sub_type', 'study_abbreviation']
df = pd.DataFrame.from_records(samples, columns=labels)
df

    id  disease_type  disease_sub_type              study_abbreviation
0   001 RENAL         CHROMOPHOBE                   KICH
1   002 OVARIAN       HIGH_GRADE_SEROUS_CARCINOMA   LGSOC
2   003 OVARIAN       OTHER                         NaN
3   001 COLORECTAL    ADENOCARCINOMA                KICH

I want to be able to compress the repeated id, say 001 in this case so that I can have the disease_type and disease_sub_type, study_abbreviation merged into 1 cell each (nested).

Output

    id  disease_type         disease_sub_type               study_abbreviation
0   001 RENAL,COLORECTAL     CHROMOPHOBE,ADENOCARCINOMA     KICH, KICH
1   002 OVARIAN              HIGH_GRADE_SEROUS_CARCINOMA    LGSOC
2   003 OVARIAN              OTHER                          NaN

This is not for anything but admin work hence the stupid ask but would help greatly when I need to merge on other datasets, thanks again.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

>Solution :

You could group by your ‘id’ column and use list as an aggregation:

df.groupby('id',as_index=False).agg(','.join)
    id      disease_type             disease_sub_type study_abbreviation
0  001  RENAL,COLORECTAL   CHROMOPHOBE,ADENOCARCINOMA          KICH,KICH
1  002           OVARIAN  HIGH_GRADE_SEROUS_CARCINOMA              LGSOC
2  003           OVARIAN                        OTHER                NaN
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading