Let’s say I have some code setup like this:
for line in open(all_data):
line = line.strip()
#BLACKLIST
if not re.search(r"config/", line) and not re.search(r"html", line):
line = re.split(r"\s+", line)
Where I’m excluding any line with config/ or html.
If I wanted to instead make a list to feed re.search, how would I go about this?
For example, if I wanted to give re.search blacklist = ['config/', 'html']
>Solution :
You could use the any function to implement this.
forbidden_words = [r"html", r"config/"]
for line in open(all_data):
line = line.strip()
#BLACKLIST
if not any(re.search(term, line) for term in forbidden_words):
line = re.split(r"\s+", line)
This is easier to see if you translate your conditional from:
not re.search(r"config/", line) and not re.search(r"html", line)
To:
not (re.search(r"config/", line) or re.search(r"html", line))
It’s important to implement it this way because we know the line should be filtered out if it contains any forbidden word. This can take much less time to check than to ensure it doesn’t contain all of the forbidden words.