imagine I got a list of strings containing duplicates with different lengths:
liste = ['I am googling for the solution for an hour now','I am googling for the solution for an hour now --Sent via mail--', 'I am googling for the solution for an hour now --Sent via mail-- What are you doing?', 'Hello I am good thanks >> How are you?','Hello I am good thanks', 'Hello I am good thanks >>']
Wanted Output:
liste =['I am googling for the solution for an hour now','Hello I am good thanks']
As you can see the strings are pretty close to duplicates but aren’t exact duplicates. So a approach like this doesn’t work:
mylist = list(dict.fromkeys(liste))
Have you got any idea how to just keep the shortest duplicate? The duplicates are always consecutive.
Thank you!
>Solution :
You can do the following:
mylist = []
for s in sorted(liste):
if not (mylist and s.startswith(mylist[-1])):
mylist.append(s)