Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

How to remove repeated lines from a file?

I have one.txt file with data:

 822.25 111.48 883.59 256.68
 822.25 111.48 883.59 256.68
 8.6 123.68 467.27 276.69
 0.0 186.77 165.62 375.0
 0.0 186.77 165.62 375.0
 724.76 177.83 923.52 316.78
 724.76 177.83 923.52 316.78
 724.76 177.83 923.52 316.78
 724.76 177.83 923.52 316.78
 724.76 177.83 923.52 316.78
 438.03 148.5 540.88 198.54
 511.99 170.97 571.74 215.81
 511.99 170.97 571.74 215.81

For lines that are repeated I want to write only one line for them. For instance:

724.76 177.83 923.52 316.78

is repeated 5 times, I want to write it only one time and do the same thing for other lines as well and write new data to a file.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

My code:

with open('one.txt', 'r') as infile:
    with open('output.txt', 'w') as outfile:
        for line in infile:
            #how to do this?
            if line are repeated remove and replace them with only one line
               outfile.write(line)

>Solution :

you probably want itertools.groupby, without a comparison function it just returns a ‘group’ per unique line so you can just skip the group entirely and just write one line from each grouping.

with open('one.txt', 'r') as infile:
    with open('output.txt', 'w') as outfile:
        for line, _ in itertools.groupby(infile):
            outfile.write(line)

This would only replace groups that occur in the same area, if repeated lines may appear in multiple places in the file (e.g. a a b a would write a b a) then you can keep a set of lines you have seen already

seen_lines = set()
with open('one.txt', 'r') as infile:
    with open('output.txt', 'w') as outfile:
        for line in infile:
            if line in seen_lines:
                continue
            outfile.write(line)
            seen_lines.add(line)
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading