Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

How can I remove duplicate parts of a string, within a column and sort the values

I have a dataframe like this (assuming one column):

column
[A,C,B,A]
[HELLO,HELLO,ha]
[test/1, test/1, test2]

The type of the column above is:
dtype(‘O’)

I would like to remove the duplicates here, resulting in:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

column
[A,C,B]                  # - A
[HELLO, ha]               # removing 1 hello
[test/1, test2]  # removing 1 test/1 

Then, I would like to sort the data

column
[A,B,C]                  
[ha, HELLO]             
[test2, test/1]  # assuming that number comes before / 

I am struggling getting this done in a proper way. Hope anyone has nice ideas (would it make sense to transform to small lists?)

>Solution :

Assuming that you have lists in the column, use a list comprehension.

If you want to maintain order:

df['column_keep_order'] = [list(dict.fromkeys(x)) for x in df['column']]

If you want to sort the items:

df['column_sorted'] = [sorted(set(x)) for x in df['column']]

output:

                    column column_keep_order    column_sorted
0             [A, C, B, A]         [A, C, B]        [A, B, C]
1       [HELLO, HELLO, ha]       [HELLO, ha]      [HELLO, ha]
2  [test/1, test/1, test2]   [test/1, test2]  [test/1, test2]

reproducible input:

df = pd.DataFrame({'column': [['A','C','B','A'],
                              ['HELLO','HELLO','ha'],
                              ['test/1', 'test/1', 'test2']]})
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading