Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Pandas filter rows by last 12 months in data frame

I need to keep only the rows with other columns for months with past 12 months. The max date here is 2022-08-01, so the resulting dataframe should have data from 2021-09-01 to 2022-08-01
Input data frame:

    d = {'MONTH': ['2021-01-01', '2021-02-01','2021-03-01','2021-04-01',
               '2021-05-01', '2021-06-01','2021-07-01','2021-08-01',
               '2021-09-01', '2021-10-01','2021-11-01','2021-12-01',
               '2022-01-01', '2022-02-01','2022-03-01','2022-04-01',
               '2022-05-01', '2022-06-01','2022-07-01','2022-08-01',
               '2022-01-01', '2022-02-01','2022-03-01','2022-04-01',
               '2022-05-01', '2022-06-01','2022-07-01','2022-08-01'], 
     'col2': [3,4,1,2,
              3,4,1,2,
              3,4,1,2,
              3,4,1,2,
              3,4,1,2,
             3,4,1,2,
              3,4,1,2],
    'col3': [3,4,1,2,
              3,4,1,2,
              3,4,1,2,
              3,4,1,2,
              3,4,1,2,
            3,4,1,2,
              3,4,1,2],
    'col4': [3,4,1,2,
              3,4,1,2,
              3,4,1,2,
              3,4,1,2,
              3,4,1,2,3,4,1,2,
              3,4,1,2]}
df = pd.DataFrame(data=d)

Resulting in dataframe:

d = {'MONTH': ['2021-09-01', '2021-10-01','2021-11-01','2021-12-01',
               '2022-01-01', '2022-02-01','2022-03-01','2022-04-01',
               '2022-05-01', '2022-06-01','2022-07-01','2022-08-01',
               '2022-01-01', '2022-02-01','2022-03-01','2022-04-01',
               '2022-05-01', '2022-06-01','2022-07-01','2022-08-01'], 
     'col2': [3,4,1,2,
              3,4,1,2,
              3,4,1,2,
             3,4,1,2,
              3,4,1,2],
    'col3': [3,4,1,2,
              3,4,1,2,
              3,4,1,2,
            3,4,1,2,
              3,4,1,2],
    'col4': [3,4,1,2,
              3,4,1,2,
              3,4,1,2,3,4,1,2,
              3,4,1,2]}
df = pd.DataFrame(data=d)

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

>Solution :

Use pd.to_datetime

df['MONTH'] = pd.to_datetime(df['MONTH'])
df_new = df[df['MONTH'] >= '2021-09-01']

If you want to make it dynamic depending upon max date of the dataset, use relativedelta

from dateutil.relativedelta import relativedelta
df_new = df[df['MONTH'] >= df_new['MONTH'].max() - relativedelta(months=11)]
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading