Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

How do I compare every line of the 1st text file to every line of the 2nd text file in Python?

I have 2 text files named f1 & f2 with 100k lines of names each. I want to compare the first line of f1 with every line of f2, then the second line of f1 with every line of f2, and so on. I already tried using nested for loop like code below but it doesn’t work.

What am I doing wrong I can’t seem to find? Please can someone tell me?

Thanks in advance.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

old.txt

sourcreameggnest
saturnnixgreentea
saxophonedesertham
footballplumvirgo
soybeansthesting
cauliflowertornado
sourcreameggnest
saturnnixgreentea

new.txt

goldfishpebbleduck
saxophonedesertham
footballplumvirgo
abloomtheavengers
venisonflowersea
goodfellaswalker
saturnnixgreentea

Code:

 with open('old.txt', 'r') as f1, open('new.txt', 'r') as f2:
    
    for line1 in f1:
        print('Line 1:- ' + line1, end='')
        
        for line2 in f2:
            print('Line 2:- ' + line2, end='')
            
            if line1.strip() == line2:
                print("Inside comparison" + line1, end='')

Output:

Line 1:- goldfishpebbleduck
Line 2:- sourcreameggnest
Line 2:- saturnnixgreentea
Line 2:- saxophonedesertham
Line 2:- footballplumvirgo
Line 2:- soybeansthesting
Line 2:- cauliflowertornado
Line 2:- sourcreameggnest
Line 2:- saturnnixgreentea
Line 1:- saxophonedesertham
Line 1:- footballplumvirgo
Line 1:- abloomtheavengers
Line 1:- venisonflowersea
Line 1:- goodfellaswalker
Line 1:- saturnnixgreentea

>Solution :

Combining the answers of @LukasNeugebauer and @Thierry Lathuille, here’s what your code should look like:

with open('old.txt', 'r') as f1, open('new.txt', 'r') as f2:
    lines1 = f1.readlines()
    lines2 = f2.readlines()
    for line1 in lines1:
        print('Line 1:- ' + line1, end='')
        if line1 in lines2:
            print("Inside comparison" + line1, end='')

If you are wondering, whether using in check is faster then iterating through the second list and comparing each value with ==, I tested it. For both files containing 10,000 lines of random strings, it took ~2.8 seconds to process them fully with two loops and only ~0.8 using the in operator.

If your files are not bigger than a megabyte, I wouldn’t really bother optimizing this, but otherwise you should really think about what you are actually comparing and what shortcuts can you use.

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading