Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Filter rows with more than X entries per year

I have a df with IDs and Dates (. Here is an example:

ID  Date
1   26.04.2011
1   21.10.2011
14  25.02.2010
14  08.07.2010
14  20.10.2010
14  07.01.2011
14  20.04.2011
14  02.07.2011
14  11.10.2011
14  23.01.2012
14  19.04.2012
14  22.10.2012
14  15.01.2013
14  06.05.2013
18  23.11.2012
18  05.06.2013
18  19.08.2013
18  11.04.2014
18  18.07.2014

ID            object
Date     datetime64[ns]

I want to keep only those rows where there are =< 3 Dates per year per ID. So the result should be:

ID  Date
14  25.02.2010
14  08.07.2010
14  20.10.2010
14  07.01.2011
14  20.04.2011
14  02.07.2011
14  11.10.2011
14  23.01.2012
14  19.04.2012
14  22.10.2012

I tried groupby and size:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

            ID  year  size
0            1  2011     2
1           14  2010     3
2           14  2011     4

However this is not what I want.

>Solution :

Use GroupBy.transform per ID and years with count by GroupBy.size, compare for greater or equal by Series.ge and filter in boolean indexing:

df['Date'] = pd.to_datetime(df['Date'], dayfirst=True)

df = df[df.groupby(['ID',df['Date'].dt.year])['ID'].transform('size').ge(3)]
print (df)
    ID       Date
2   14 2010-02-25
3   14 2010-07-08
4   14 2010-10-20
5   14 2011-01-07
6   14 2011-04-20
7   14 2011-07-02
8   14 2011-10-11
9   14 2012-01-23
10  14 2012-04-19
11  14 2012-10-22
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading