Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Count occurrences in last 30 days with Pandas Dataframe

I have a pandas Dataframe with an ID column and a date column (YYYY-MM-DD),

ID Date
001 2022-01-01
001 2022-01-04
001 2022-02-07
002 2022-01-02
002 2022-01-03
002 2022-01-28

There may be gaps in the date field, as shown. I would like to have a new column, "occurrences_last_month" where it counts the number of occurrences for each ID in the last month (30 days).

The idea would be to add a new column with the following output,

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

ID Date Ocurrences_last_month
001 2022-01-01 0
001 2022-01-04 1
001 2022-02-07 0
002 2022-01-02 0
002 2022-01-03 1
002 2022-01-28 2

For example, in the case of ID 001:

  • January 1st: substracts 1 month to December 2nd, so 0 occurrences
  • January 2nd: goes from December 3rd to January 1st, so 1 occurrence
  • February 7th: goes from January 8th, so 0 occurrences

I tried to use the datetime.timedelta to compute a new column "date_previous_month", but I am unable to compute what I need from there, I tried using count() but I failed to obtain what I need.

>Solution :

First idea is per grouops use Rolling.count with remove first level created by ID:

df = df.set_index('Date')
df['Ocurrences_last_month'] = (df.groupby('ID')
                                 .rolling('30D')
                                 .count().sub(1).droplevel(0).astype(int))
print (df)
            ID  Ocurrences_last_month
Date                                 
2022-01-01   1                      0
2022-01-04   1                      1
2022-02-07   1                      0
2022-01-02   2                      0
2022-01-03   2                      1
2022-01-28   2                      2

EDIT: If possible duplciated values create Series and assign to original DataFrame by DataFrame.join:

s = df.groupby('ID').rolling('30D', on='Date')['Date'].count().sub(1).astype(int)

df = df.join(s.rename('Ocurrences_last_month'), on=['ID','Date'])
print (df)
   ID       Date  Ocurrences_last_month
0   1 2022-01-01                      0
1   1 2022-01-04                      1
2   1 2022-02-07                      0
3   2 2022-01-02                      0
4   2 2022-01-03                      1
5   2 2022-01-28                      2

Alternative solution from comments:

df = df.merge(s.rename('Ocurrences_last_month'), on=['ID','Date'])
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading