I am new to Python. I am trying to delete duplicates from my text file by doing the following:
line_seen = set()
f = open('a.txt', 'r')
w = open('out.txt', 'w')
for i in f:
if i not in line_seen:
w.write(i)
line_seen.add(i)
f.close()
w.close()
In the initial file I had
hello
world
python
world
hello
And in output file I got
hello
world
python
hello
So it did not remove the last duplicate. Can anyone help me to understand why it happened and how could I fix it?
>Solution :
The main problem is with the break line characters ("\n") which appears at the end of each line but the last line. You can use a combination of set, map and join function such as what follows:
f = open('a.txt', 'r')
w = open('out.txt', 'w')
w.write("\n".join(list(set(map(str.strip,f.readlines())))))
out.txt
python
world
hello
If you want to stick to your previous approach you can use:
line_seen = set()
f = open('a.txt', 'r')
w = open('out.txt', 'w')
for i in f:
i = i.strip()
if i not in line_seen:
w.write(i)
line_seen.add(i)
f.close()
w.close()