Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

How can I count the changes per row based on condition

I have some time aware data as below with group, date and id. Here id is the id of person handling the group for that specific month. What I wanted to get is how many times there has been changes in the persons handling the group.

group   date     ID
A      Jan-22    1
A      Feb-22    1
A      Mar-22    1
A      Apr-22    2
A      May-22    3
A      Jun-22    4
B      Mar-23    6
B      Apr-23    6
B      May-23    3
B      Jun-23    3
C      Sep-21    1
C      Oct-21    1
C      Nov-21    4
C      Dec-21    4
C      Jan-22    4

So the output should be something like the below. E.g. for "Group A" there are 4 person who handled since Jan-22 to June-22. And the changes are marked in the "id_changes" column in a cumulative manner.

Could you please be able to help in getting the output like this? I have tried a couple of times but unable to achieve it.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

group   date     ID    id_changes
A      Jan-22    1         0
A      Feb-22    1         0
A      Mar-22    1         0
A      Apr-22    2         1
A      May-22    3         2
A      Jun-22    4         3
B      Mar-23    6         0
B      Apr-23    6         0
B      May-23    3         1
B      Jun-23    3         1
C      Sep-21    1         0
C      Oct-21    1         0
C      Nov-21    4         1
C      Dec-21    4         1
C      Jan-22    4         1

>Solution :

This code snippet now correctly calculates the changes in the "ID" column within each group and then accumulates these changes to calculate the "id_changes" column as you specified in your desired output. import pandas as pd

# Sample data
data = {
    "group": ["A", "A", "A", "A", "A", "A", "B", "B", "B", "B", "C", "C", "C", "C", "C"],
    "date": ["Jan-22", "Feb-22", "Mar-22", "Apr-22", "May-22", "Jun-22",
             "Mar-23", "Apr-23", "May-23", "Jun-23", "Sep-21", "Oct-21", "Nov-21", "Dec-21", "Jan-22"],
    "ID": [1, 1, 1, 2, 3, 4, 6, 6, 3, 3, 1, 1, 4, 4, 4]
}

df = pd.DataFrame(data)

# Convert date column to datetime
df['date'] = pd.to_datetime(df['date'], format='%b-%y')

# Sort the DataFrame by group and date
df = df.sort_values(by=['group', 'date'])

# Calculate the changes in ID within each group
df['id_changes'] = df.groupby('group')['ID'].diff().fillna(0)

# Reset id_changes to 0 where ID changes
df.loc[df['id_changes'] != 0, 'id_changes'] = 1

# Calculate cumulative changes in id_changes within each group
df['id_changes'] = df.groupby('group')['id_changes'].cumsum()

# Print the resulting DataFrame
print(df)

OUTPUT:

   group       date  ID  id_changes
0      A 2022-01-01   1         0.0
1      A 2022-02-01   1         0.0
2      A 2022-03-01   1         0.0
3      A 2022-04-01   2         1.0
4      A 2022-05-01   3         2.0
5      A 2022-06-01   4         3.0
6      B 2023-03-01   6         0.0
7      B 2023-04-01   6         0.0
8      B 2023-05-01   3         1.0
9      B 2023-06-01   3         1.0
10     C 2021-09-01   1         0.0
11     C 2021-10-01   1         0.0
12     C 2021-11-01   4         1.0
13     C 2021-12-01   4         1.0
14     C 2022-01-01   4         1.0
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading