Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Python – Count duplicate user Id's occurence in a given month

If I create a Dataframe from

df = pd.DataFrame({"date": ['2022-08-10','2022-08-18','2022-08-18','2022-08-20','2022-08-20','2022-08-24','2022-08-26','2022-08-30','2022-09-3','2022-09-8','2022-09-13'], 
              "id": ['A','B','C','D','E','B','A','F','G','F','H']})

df['date'] = pd.to_datetime(df['date'])

(Table 1 below showing the data)

I am interested in counting how many times an ID appears in a given month. For example in a given month A, B and F all occur twice whilst everything else occurs once. The difficulty with this data is that the the frequency of dates are not evenly spread out.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

I attempted to resample on date by month, with the hope of counting duplicates.

df.resample('M', on='date')['id']

But all the functions that can be used on resample just give me the number of unique occurences rather than how many times each ID occured.

A rough example of the output is below [Table 2]

All of the examples I have seen merely count how many total or unique occurences occur for a given month, this question is focused on finding out how many occurences each Id had in a month.

Thankyou for your time.

[Table 1] – Data

idx date id
0 2022-08-10 A
1 2022-08-18 B
2 2022-08-18 C
3 2022-08-20 D
4 2022-08-20 E
5 2022-08-24 B
6 2022-08-26 A
7 2022-08-30 F
8 2022-09-03 G
9 2022-09-08 F
10 2022-09-13 H

[Table 2] – Rough example of desired output

id occurences in a month
A 2
B 2
C 1
D 1
E 1
F 2
G 1
H 1

>Solution :

Use Series.dt.to_period for month periods and count values per id by GroupBy.size, then aggregate sum:

df1 = (df.groupby(['id', df['date'].dt.to_period('m')])
        .size()
        .groupby(level=0)
        .sum()
        .reset_index(name='occurences in a month'))
print (df1)
  id  occurences in a month
0  A                      2
1  B                      2
2  C                      1
3  D                      1
4  E                      1
5  F                      2
6  G                      1
7  H                      1

Or use Grouper:

df1 = (df.groupby(['id',pd.Grouper(freq='M', key='date')])
        .size()
        .groupby(level=0)
        .sum()
        .reset_index(name='occurences in a month'))
print (df1)

EDIT:

df = pd.DataFrame({"date": ['2022-08-10','2022-08-18','2022-08-18','2022-08-20','2022-08-20','2022-08-24','2022-08-26',
                            '2022-08-30','2022-09-3','2022-09-8','2022-09-13','2050-12-15'],
                   "id": ['A','B','C','D','E','B','A','F','G','F','H','H']}) 

df['date'] = pd.to_datetime(df['date'],format='%Y-%m-%d')

print (df)
        

Because count first per month or days or dates and sum values it is same like:

df1 = df.groupby('id').size().reset_index(name='occurences')
        
print (df1)
  id  occurences
0  A           2
1  B           2
2  C           1
3  D           1
4  E           1
5  F           2
6  G           1
7  H           2

Same sum of counts per id:

df1 = (df.groupby(['id', df['date'].dt.to_period('m')])
        .size())
print (df1)
id  date   
A   2022-08    2
B   2022-08    2
C   2022-08    1
D   2022-08    1
E   2022-08    1
F   2022-08    1
    2022-09    1
G   2022-09    1
H   2022-09    1
    2050-12    1
dtype: int64

df1 = (df.groupby(['id', df['date'].dt.to_period('d')])
        .size())
print (df1)
id  date      
A   2022-08-10    1
    2022-08-26    1
B   2022-08-18    1
    2022-08-24    1
C   2022-08-18    1
D   2022-08-20    1
E   2022-08-20    1
F   2022-08-30    1
    2022-09-08    1
G   2022-09-03    1
H   2022-09-13    1
    2050-12-15    1
dtype: int64

df1 = (df.groupby(['id', df['date'].dt.day])
        .size())
print (df1)
id  date
A   10      1
    26      1
B   18      1
    24      1
C   18      1
D   20      1
E   20      1
F   8       1
    30      1
G   3       1
H   13      1
    15      1
dtype: int64
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading