Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Sort Words in Pandas Column list

Below is the DF

df = pd.DataFrame({'cd1' : ['PFE1', 'PFE25', np.nan, np.nan], 
                   'cd2' : [np.nan, 'PFE28', 'PFE23', 'PFE14'], 
                   'cd3' : ['PFE15', 'PFE2', 'PFE83', np.nan], 
                   'cd4' : ['PFE25', np.nan, 'PFE39', 'PFE47'], 
                   'cd5' : [np.nan, 'PFE21', 'PFE53', 'PFE15']})
df


cd1   cd2    cd3    cd4     cd5
PFE1  NaN    PFE15  PFE25   NaN
PFE25 PFE28  PFE2   NaN     PFE21
NaN   PFE23  PFE83  PFE39   PFE53
NaN   PFE14  NaN    PFE47   PFE15

There are multiples task that I’m trying to do (get some helps from previous stack questions thanks for that!)

Combine Multiple Cols & Remove Duplicates Values (not in this eg)

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

df['combined'] = df.agg(lambda x: list(x.dropna()), axis=1)
df['Codes'] = list(map(set, df['combined']))

cd1   cd2   cd3   cd4   cd5     combined                       Codes
PFE1  NaN   PFE15 PFE25 NaN     [PFE1, PFE15, PFE25]           {PFE25, PFE1, PFE15}
PFE25 PFE28 PFE2  NaN   PFE21   [PFE25, PFE28, PFE2, PFE21]    {PFE28, PFE21, PFE25, PFE2}
NaN   PFE23 PFE83 PFE39 PFE53   [PFE23, PFE83, PFE39, PFE53]   {PFE83, PFE23, PFE39, PFE53}
NaN   PFE14 NaN   PFE47 PFE15   [PFE14, PFE47, PFE15]          {PFE14, PFE47, PFE15}  

The aim is to sort words
Below is the expected output

Output_col
PFE1,  PFE15, PFE25
PFE2,  PFE21, PFE25, PFE28
PFE23, PFE29, PFE53, PFE83
PFE14, PFE15, PFE47

I tried to sort after agg not working

df['combined'] = df.agg(lambda x: list(x.dropna()), axis=1).sort_values()

Also tried to sort directly the column but not working

df['combined'] = df['combined'].sort_values()

So if anyone has some clues thanks for your help!

>Solution :

I think this is doing what you want?

Need to add a sort into the lambda function so the list itself is being sorted not the column at the end

Not sure if there’s a neater way to avoid making a function, but the list.sort() function doesn’t return a new list, it modifies the existing one

def sort_list(my_list:list)->list:
    temp_list = my_list.copy()
    temp_list.sort()
    return temp_list

df.agg(lambda x: sort_list(list(x.dropna())), axis=1)

make output

0            [PFE1, PFE15, PFE25]
1     [PFE2, PFE21, PFE25, PFE28]
2    [PFE23, PFE39, PFE53, PFE83]
3           [PFE14, PFE15, PFE47]
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading