The complete code i have is
import re
filepath = r'filepath\Q_P.txt'
regex = r"^([a-zA-Z]:)\\(?:.*\\)?(\d{2}-\d{2}-\d{4}[a-zA-Z]?)"
with open('Q_P.txt', 'r') as f:
text = f.read()
match = re.search(regex, text)
if match:
print(f"{match.group(1)} {match.group(2)}")
it seems to run fine but it returns no matches when i know the text file does infact have multiple strings that should match. Some examples of the strings in text files are as follows:
Q:\Region10LOMAs\FY 98\98-10-2537A.pdf
Q:\Region10LOMAs\FY 98\98-10-3222A.pdf
P:\DBI_rescans\11-05-4377A.pdf
P:\DBI_rescans\11-05-4378A.pdf
the output i am looking for would be along the lines of
Q:98-10-2537A
Q:98-10-3222A
P:11-05-4377A
P:11-05-4378A
Just wondering what im missing in order to actually get matches rather than it running through the code without errors and outputting nothing
>Solution :
You put a ^ at the front of your regex, so it only matches at the start of the string; you could only match the very first line in your file. To allow it to match at the start of any line in a multiline string, add the re.M/re.MULTILINE flag:
match = re.search(regex, text, re.M)
Or just loop over your file by line and apply the unmodified regex:
compiled_re = re.compile(regex) # Precompiling removes cache lookup costs of
# module level functions
with open('Q_P.txt', 'r') as f:
for line in f:
match = compiled_re.match(line)
if match:
print(f"{match.group(1)} {match.group(2)}")
# Optionally break here if you really only want data on one hit
which (assuming you wanted to find all matches) is likely a little slower than finditer on the whole file’s data at once when the file fits in memory, but means you can run against files of essentially arbitrary size.