How can I count the changes per row based on condition

August 2, 2023

I have some time aware data as below with group, date and id. Here id is the id of person handling the group for that specific month. What I wanted to get is how many times there has been changes in the persons handling the group.

group   date     ID
A      Jan-22    1
A      Feb-22    1
A      Mar-22    1
A      Apr-22    2
A      May-22    3
A      Jun-22    4
B      Mar-23    6
B      Apr-23    6
B      May-23    3
B      Jun-23    3
C      Sep-21    1
C      Oct-21    1
C      Nov-21    4
C      Dec-21    4
C      Jan-22    4

So the output should be something like the below. E.g. for "Group A" there are 4 person who handled since Jan-22 to June-22. And the changes are marked in the "id_changes" column in a cumulative manner.

Could you please be able to help in getting the output like this? I have tried a couple of times but unable to achieve it.

group   date     ID    id_changes
A      Jan-22    1         0
A      Feb-22    1         0
A      Mar-22    1         0
A      Apr-22    2         1
A      May-22    3         2
A      Jun-22    4         3
B      Mar-23    6         0
B      Apr-23    6         0
B      May-23    3         1
B      Jun-23    3         1
C      Sep-21    1         0
C      Oct-21    1         0
C      Nov-21    4         1
C      Dec-21    4         1
C      Jan-22    4         1

>Solution :

This code snippet now correctly calculates the changes in the "ID" column within each group and then accumulates these changes to calculate the "id_changes" column as you specified in your desired output. import pandas as pd

# Sample data
data = {
    "group": ["A", "A", "A", "A", "A", "A", "B", "B", "B", "B", "C", "C", "C", "C", "C"],
    "date": ["Jan-22", "Feb-22", "Mar-22", "Apr-22", "May-22", "Jun-22",
             "Mar-23", "Apr-23", "May-23", "Jun-23", "Sep-21", "Oct-21", "Nov-21", "Dec-21", "Jan-22"],
    "ID": [1, 1, 1, 2, 3, 4, 6, 6, 3, 3, 1, 1, 4, 4, 4]
}

df = pd.DataFrame(data)

# Convert date column to datetime
df['date'] = pd.to_datetime(df['date'], format='%b-%y')

# Sort the DataFrame by group and date
df = df.sort_values(by=['group', 'date'])

# Calculate the changes in ID within each group
df['id_changes'] = df.groupby('group')['ID'].diff().fillna(0)

# Reset id_changes to 0 where ID changes
df.loc[df['id_changes'] != 0, 'id_changes'] = 1

# Calculate cumulative changes in id_changes within each group
df['id_changes'] = df.groupby('group')['id_changes'].cumsum()

# Print the resulting DataFrame
print(df)

OUTPUT:

   group       date  ID  id_changes
0      A 2022-01-01   1         0.0
1      A 2022-02-01   1         0.0
2      A 2022-03-01   1         0.0
3      A 2022-04-01   2         1.0
4      A 2022-05-01   3         2.0
5      A 2022-06-01   4         3.0
6      B 2023-03-01   6         0.0
7      B 2023-04-01   6         0.0
8      B 2023-05-01   3         1.0
9      B 2023-06-01   3         1.0
10     C 2021-09-01   1         0.0
11     C 2021-10-01   1         0.0
12     C 2021-11-01   4         1.0
13     C 2021-12-01   4         1.0
14     C 2022-01-01   4         1.0