How to remove repeated lines from a file?

November 9, 2021

I have one.txt file with data:

 822.25 111.48 883.59 256.68
 822.25 111.48 883.59 256.68
 8.6 123.68 467.27 276.69
 0.0 186.77 165.62 375.0
 0.0 186.77 165.62 375.0
 724.76 177.83 923.52 316.78
 724.76 177.83 923.52 316.78
 724.76 177.83 923.52 316.78
 724.76 177.83 923.52 316.78
 724.76 177.83 923.52 316.78
 438.03 148.5 540.88 198.54
 511.99 170.97 571.74 215.81
 511.99 170.97 571.74 215.81

For lines that are repeated I want to write only one line for them. For instance:

724.76 177.83 923.52 316.78

is repeated 5 times, I want to write it only one time and do the same thing for other lines as well and write new data to a file.

My code:

with open('one.txt', 'r') as infile:
    with open('output.txt', 'w') as outfile:
        for line in infile:
            #how to do this?
            if line are repeated remove and replace them with only one line
               outfile.write(line)

>Solution :

you probably want itertools.groupby, without a comparison function it just returns a ‘group’ per unique line so you can just skip the group entirely and just write one line from each grouping.

with open('one.txt', 'r') as infile:
    with open('output.txt', 'w') as outfile:
        for line, _ in itertools.groupby(infile):
            outfile.write(line)

This would only replace groups that occur in the same area, if repeated lines may appear in multiple places in the file (e.g. a a b a would write a b a) then you can keep a set of lines you have seen already

seen_lines = set()
with open('one.txt', 'r') as infile:
    with open('output.txt', 'w') as outfile:
        for line in infile:
            if line in seen_lines:
                continue
            outfile.write(line)
            seen_lines.add(line)