Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Multiple group counts within data base

I have been presented with a very small dataset that has the date of each time a user logs into a system, I have to use this data set to create a table where I can show for each log-in the cumulative monthly counts of logs and the overall cumulative counts of logs, this is the data set I have:

date user
1/01/2022 Mark
2/01/2022 Mark
3/02/2022 Mark
4/02/2022 Mark
5/03/2022 Mark
6/03/2022 Mark
7/03/2022 Mark
8/03/2022 Mark
9/03/2022 Mark

and this is my desired output:

row date user monthly_track acum_track
1 1/01/2022 Mark 1 1
2 2/01/2022 Mark 2 2
3 3/02/2022 Mark 1 3
4 4/02/2022 Mark 2 4
5 5/03/2022 Mark 1 5
6 6/03/2022 Mark 2 6
7 7/03/2022 Mark 3 7
8 8/03/2022 Mark 4 8
9 9/03/2022 Mark 5 9

Why? Let’s look at the row number 5. This is the first time the user Mark has logged into the system during the month 3 (March) but it is the 5th overall login in the data set (for the purpose of learning there will only be one year (2022).

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

I have no idea as to how to get the monthly and overall count together. I can groupby user and sort by date to count how many times in total a user has logged in, but I know that in order to achive my desired output I will have to group by date and user and then make counts based on month but I will have to somehow group the data by user (only) to get the overall count and I dont think I could group twice the data.

>Solution :

First you need to convert date to actual datetime values with to_datetime. The rest is simple with groupby and cumcount:

df['date'] = pd.to_datetime(df['date'], format='%d/%m/%Y')
df['monthly_count'] = df.groupby([df['user'], df['date'].dt.year, df['date'].dt.month]).cumcount() + 1
df['acum_count'] = df.groupby('user').cumcount() + 1

Output:

>>> df
        date  user  monthly_count  acum_count
0 2022-01-01  Mark              1           1
1 2022-01-02  Mark              2           2
2 2022-02-03  Mark              1           3
3 2022-02-04  Mark              2           4
4 2022-03-05  Mark              1           5
5 2022-03-06  Mark              2           6
6 2022-03-07  Mark              3           7
7 2022-03-08  Mark              4           8
8 2022-03-09  Mark              5           9
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading