I have a list of values and I need to search through an ~1 GB text file for those values and then pull a corresponding value from the text file in the same row. Is there a more efficient method than using nested for loops where I would be going through my list of values and then going into my nested loop where I would be iterating through the text file to find the corresponding matches?
for name in name_list:
with open("test.txt") as infile:
for line in infile:
currentline = line.split("|")
if name == currentline[0]:
print(currentline[2])
>Solution :
This is exactly what a dictionary is for. Assuming parts[0] is unique:
dic = {}
with open("test.txt") as infile:
for line in infile:
parts = line.split("|")
dic[parts[0]] = parts[2]
for name in name_list:
if name in dic:
print(dic[name])
if it isn’t unique, you’ll have to keep a list for each dict entry:
dic = {}
with open("test.txt") as infile:
for line in infile:
parts = line.split("|")
if parts[0] in dic:
dic[parts[0]].append(parts[2])
else:
dic[parts[0]] = [parts[2]]
for name in name_list:
if name in dic:
for matching in dic[name]:
print(matching)
this drastically reduces the runtime: assuming your file has n entries and your name_list has m entries, you had a complexity of O(n * m) before – now you have O(m), since hash map access is constant time – you don’t perform a linear search anymore!
In practice, the speedup will be even larger since your previous reread the file within the O(m) loop.