Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Is there a more efficient method than nested loops to search through a list and then find corresponding matches in a text file?

I have a list of values and I need to search through an ~1 GB text file for those values and then pull a corresponding value from the text file in the same row. Is there a more efficient method than using nested for loops where I would be going through my list of values and then going into my nested loop where I would be iterating through the text file to find the corresponding matches?

    for name in name_list:
        with open("test.txt") as infile:
            for line in infile:
                currentline = line.split("|")
                if name == currentline[0]:
                    print(currentline[2])

>Solution :

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

This is exactly what a dictionary is for. Assuming parts[0] is unique:

dic = {}
with open("test.txt") as infile:
    for line in infile:
        parts = line.split("|")
        dic[parts[0]] = parts[2]
for name in name_list:
    if name in dic:
        print(dic[name])

if it isn’t unique, you’ll have to keep a list for each dict entry:

dic = {}
with open("test.txt") as infile:
    for line in infile:
        parts = line.split("|")
        if parts[0] in dic:
            dic[parts[0]].append(parts[2])
        else:
            dic[parts[0]] = [parts[2]]
for name in name_list:
    if name in dic:
        for matching in dic[name]:
            print(matching)

this drastically reduces the runtime: assuming your file has n entries and your name_list has m entries, you had a complexity of O(n * m) before – now you have O(m), since hash map access is constant time – you don’t perform a linear search anymore!

In practice, the speedup will be even larger since your previous reread the file within the O(m) loop.

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading