Can someone please help me locate the first date using regex from file names formatted:
TEST_2022-03-04-05-30-20.csv_parsed.csv_encrypted.csv
But I am getting the following error on the re.search line: >AttributeError: ‘NoneType’ object has no attribute ‘group’
Because I am not searching re.search correctly, can someone please correct my re.search based on the above filename? I want to pull the first date only from the file names.
I am new to regex, can anyone help? Thank you!
I tried the following and I am expected to extract the first date in each file name formatted as 2022-03-04
date = re.search(‘\b(\d{4}-\d{2}-\d{2}).’, filename)
>Solution :
There are a few problems with your regex.
First, the regex itself is incorrect:
\b # Match a word boundary (non-word character followed by word character or vice versa)
( # followed by a group which consists of
\d{4}- # 4 digits and '-', then
\d{2}- # 2 digits and '-', then
\d{2} # another 2 digits
) # and eventually succeeded by
\. # a dot
Since your filename (TEST_2022-03-04-05-30-20.csv_parsed.csv_encrypted.csv) doesn’t have any such group, re.search() fails and returns None. Here is why:
2022-03-04is not succeeded by a dot\bdoes not match as both_and2are considered word character.
That being said, the regex should be modified, like this:
(?<=_) # Match something preceded by '_', which will not be included in our match,
\d{4}- # 4 digits and '-', then
\d{2}- # 2 digits and '-', then
\d{2} # another 2 digits, then
\b # a word boundary
Now, do you see those backslashes? Always remember that you need to escape them again in strings. This can be automated using raw strings:
r'(?<=_)\d{4}-\d{2}-\d{2}\b'
Try it:
filename = 'TEST_2022-03-04-05-30-20.csv_parsed.csv_encrypted.csv'
match = re.search(r'(?<=_)\d{4}-\d{2}-\d{2}\b', filename).group(0)
print(m) # '2022-03-04'