Regex seems to be working but is not returning any matches

June 15, 2022

The complete code i have is

import re
filepath = r'filepath\Q_P.txt'
regex = r"^([a-zA-Z]:)\\(?:.*\\)?(\d{2}-\d{2}-\d{4}[a-zA-Z]?)"
with open('Q_P.txt', 'r') as f:
    text = f.read()
    match = re.search(regex, text)
    if match:
        print(f"{match.group(1)} {match.group(2)}")

it seems to run fine but it returns no matches when i know the text file does infact have multiple strings that should match. Some examples of the strings in text files are as follows:

Q:\Region10LOMAs\FY 98\98-10-2537A.pdf

Q:\Region10LOMAs\FY 98\98-10-3222A.pdf

P:\DBI_rescans\11-05-4377A.pdf

P:\DBI_rescans\11-05-4378A.pdf

the output i am looking for would be along the lines of

Q:98-10-2537A

Q:98-10-3222A

P:11-05-4377A

P:11-05-4378A

Just wondering what im missing in order to actually get matches rather than it running through the code without errors and outputting nothing

>Solution :

You put a ^ at the front of your regex, so it only matches at the start of the string; you could only match the very first line in your file. To allow it to match at the start of any line in a multiline string, add the re.M/re.MULTILINE flag:

match = re.search(regex, text, re.M)

Or just loop over your file by line and apply the unmodified regex:

compiled_re = re.compile(regex)  # Precompiling removes cache lookup costs of
                                 # module level functions
with open('Q_P.txt', 'r') as f:
    for line in f:
        match = compiled_re.match(line)
        if match:
            print(f"{match.group(1)} {match.group(2)}")
            # Optionally break here if you really only want data on one hit

which (assuming you wanted to find all matches) is likely a little slower than finditer on the whole file’s data at once when the file fits in memory, but means you can run against files of essentially arbitrary size.