Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Regex Blacklist

Let’s say I have some code setup like this:

for line in open(all_data):

    line = line.strip()

    #BLACKLIST

    if not re.search(r"config/", line) and not re.search(r"html", line):

        line = re.split(r"\s+", line)

     

Where I’m excluding any line with config/ or html.

If I wanted to instead make a list to feed re.search, how would I go about this?

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

For example, if I wanted to give re.search blacklist = ['config/', 'html']

>Solution :

You could use the any function to implement this.

forbidden_words = [r"html", r"config/"]

for line in open(all_data):
    line = line.strip()

    #BLACKLIST
    if not any(re.search(term, line) for term in forbidden_words):
        line = re.split(r"\s+", line)

This is easier to see if you translate your conditional from:

not re.search(r"config/", line) and not re.search(r"html", line)

To:

not (re.search(r"config/", line) or re.search(r"html", line))

It’s important to implement it this way because we know the line should be filtered out if it contains any forbidden word. This can take much less time to check than to ensure it doesn’t contain all of the forbidden words.

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading