Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Completely deleting duplicates words in a text file

I have some words in a text file like:

joynal
abedin
rahim
mohammad
joynal
abedin 
mohammad
kudds

I want to delete the duplicate names. It will delete these duplicate entries totally from the text file

The output should be like:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

rahim 
kuddus

I have tried some coding but it’s only giving me the duplicate values as one like 1.joynal and 2.abedin.

Edited: This is the code I tried:

content = open('file.txt' , 'r').readlines()
content_set = set(content)
cleandata = open('data.txt' , 'w')

for line in content_set:
    cleandata.write(line)

>Solution :

Use a Counter:

from collections import Counter 

with open(fn) as f:
    cntr=Counter(w.strip() for w in f)

Then just print the words with a count of 1:

>>> print('\n'.join(w for w,cnt in cntr.items() if cnt==1))
rahim
kudds

Or do it the ‘old fashion way’ with a dict as a counter:

cntr={}
with open(fn) as f:
    for line in f:
        k=line.strip()
        cntr[k]=cntr.get(k, 0)+1

>>> print('\n'.join(w for w,cnt in cntr.items() if cnt==1))
# same

If you want to output to a new file:

with open(new_file, 'w') as f_out:
    f_out.write('\n'.join(w for w,cnt in cntr.items() if cnt==1))
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading