Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Pattern matching of filename between underscores and compare included date string to current datetime

Simmilarly to the Topic to Open only files containing dates of past seven days in filename i want to open only those files wich follow a very rigid rule with respect to their naming and extract part of the filename to do a date comparission.

My filename is build like this

{Custom-prefix}_{SupplierName}_{8 digtit_date}.csv
an example:

myprefix_Shop_no24_20221009.csv

so the supplier name can have underscores in them. But each part of the string is devided by underscores as well.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

I do have the complete list of for {SupplierName} but this can change over time and i would like to avoid a soultion that hard codes them. The {SupplierName} can have numbers in them and they are of various lenght and include "_".

I tired this:

prefix = "Custom-prefix"
pattern = re.compile(fr"(?<={prefix})([a-zA-Z0-9_]*)([0-9]{8})(?=\.csv)")
# I get the filenames via os.walk
matched = pattern.search(filname)

but this seems to mach everything that sits between "CustomPrefix" and ".csv".

pattern = re.compile(fr"(?<={prefix})([a-zA-Z0-9_]*)(?=\.csv)")

Is giving me the exact same result.The way i understand this, i have to make regex aware, that it has to match the individual parts of the string and respect the underscore. so that each group of my filename:

 myprefix
_
Shop_no24
_
20221009
.csv

gets recognized. I found a solution to match to underscores in names here but i am unfortunatley not able to get the regex myself and macht the found groups afterwards to do the date comparisson.

Thank you in advance

>Solution :

You can use

pattern = re.compile(fr"{prefix}_(\w*)_(\d{{4}})(\d{{2}})(\d{{2}})\.csv")

Note the double escaped literal braces in the f-string literal.

See the Python demo:

import re
filename = "Custom-prefix_Shop_no24_20221009.csv"
prefix = "Custom-prefix"
pattern = re.compile(fr"{prefix}_(\w*)_(\d{{4}})(\d{{2}})(\d{{2}})\.csv")
matched = pattern.search(filename)
if matched:
    supplier, year, month, day  = matched.groups()
    print(f'supplier={supplier}, year={year}, month={month}, day={day}')

Output:

supplier=Shop_no24, year=2022, month=10, day=09

With (\d{4})(\d{2})(\d{2}) part, you capture all date parts into separate groups so that you can manipulate them however you see fit.

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading