Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Shortest duplicate in a list of strings

imagine I got a list of strings containing duplicates with different lengths:


liste = ['I am googling for the solution for an hour now','I am googling for the solution for an hour now --Sent via mail--', 'I am googling for the solution for an hour now --Sent via mail-- What are you doing?', 'Hello I am good thanks >> How are you?','Hello I am good thanks', 'Hello I am good thanks >>']


Wanted Output:

liste =['I am googling for the solution for an hour now','Hello I am good thanks']

As you can see the strings are pretty close to duplicates but aren’t exact duplicates. So a approach like this doesn’t work:

mylist = list(dict.fromkeys(liste))

Have you got any idea how to just keep the shortest duplicate? The duplicates are always consecutive.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

Thank you!

>Solution :

You can do the following:

mylist = []
for s in sorted(liste):
    if not (mylist and s.startswith(mylist[-1])):
        mylist.append(s)
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading