Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

How to filter pandas dataframe using lambda function with regular expression to extract date

Say I have a pandas dataframe like below (with millions of rows) –

data = {'s3_path': ['s3://mybucket/date=2023-10-26/f1.txt', 's3://mybucket/date=2023-10-25/f2.txt', 's3://mybucket/date=2023-10-24/f3.txt', 's3://mybucket/date=2023-10-23/f4.txt']}
df = pd.DataFrame(data)

I want to filter S3 paths that are before 2023-10-24. What would be an efficient way to do that in pandas? Not knowing a lot about pandas, what I can think of is below, but it is not still complete:

date_cutoff_str = '2023-10-24'
date_cutoff_obj = datetime.strptime(date_cutoff_str, '%Y-%m-%d')

def is_before(cur_date, cutoff_date):
    if cur_date < cutoff_date:
        True
    return False

date_regex_pattern = r'\d{4}-\d{2}-\d{2}'
filtered_df = df.apply(is_before, cur_date=how_do_i_get_regex_value_here, cutoff_date=date_cutoff_obj)

Any suggestion/answer would be greatly appreciated. Thank you.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

>Solution :

A possible solution, which uses regex to extract the dates as strings, and then converts them to datetime to compare with 2023-10-24:

df.loc[
    pd.to_datetime(
        df['s3_path'].str.extract(r'date=(\d{4}-\d{2}-\d{2})')[0])
    .lt('2023-10-24')]

Output:

                                s3_path
3  s3://mybucket/date=2023-10-23/f4.txt
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading