I have some time aware data as below with group, date and id. Here id is the id of person handling the group for that specific month. What I wanted to get is how many times there has been changes in the persons handling the group.
group date ID
A Jan-22 1
A Feb-22 1
A Mar-22 1
A Apr-22 2
A May-22 3
A Jun-22 4
B Mar-23 6
B Apr-23 6
B May-23 3
B Jun-23 3
C Sep-21 1
C Oct-21 1
C Nov-21 4
C Dec-21 4
C Jan-22 4
So the output should be something like the below. E.g. for "Group A" there are 4 person who handled since Jan-22 to June-22. And the changes are marked in the "id_changes" column in a cumulative manner.
Could you please be able to help in getting the output like this? I have tried a couple of times but unable to achieve it.
group date ID id_changes
A Jan-22 1 0
A Feb-22 1 0
A Mar-22 1 0
A Apr-22 2 1
A May-22 3 2
A Jun-22 4 3
B Mar-23 6 0
B Apr-23 6 0
B May-23 3 1
B Jun-23 3 1
C Sep-21 1 0
C Oct-21 1 0
C Nov-21 4 1
C Dec-21 4 1
C Jan-22 4 1
>Solution :
This code snippet now correctly calculates the changes in the "ID" column within each group and then accumulates these changes to calculate the "id_changes" column as you specified in your desired output. import pandas as pd
# Sample data
data = {
"group": ["A", "A", "A", "A", "A", "A", "B", "B", "B", "B", "C", "C", "C", "C", "C"],
"date": ["Jan-22", "Feb-22", "Mar-22", "Apr-22", "May-22", "Jun-22",
"Mar-23", "Apr-23", "May-23", "Jun-23", "Sep-21", "Oct-21", "Nov-21", "Dec-21", "Jan-22"],
"ID": [1, 1, 1, 2, 3, 4, 6, 6, 3, 3, 1, 1, 4, 4, 4]
}
df = pd.DataFrame(data)
# Convert date column to datetime
df['date'] = pd.to_datetime(df['date'], format='%b-%y')
# Sort the DataFrame by group and date
df = df.sort_values(by=['group', 'date'])
# Calculate the changes in ID within each group
df['id_changes'] = df.groupby('group')['ID'].diff().fillna(0)
# Reset id_changes to 0 where ID changes
df.loc[df['id_changes'] != 0, 'id_changes'] = 1
# Calculate cumulative changes in id_changes within each group
df['id_changes'] = df.groupby('group')['id_changes'].cumsum()
# Print the resulting DataFrame
print(df)
OUTPUT:
group date ID id_changes
0 A 2022-01-01 1 0.0
1 A 2022-02-01 1 0.0
2 A 2022-03-01 1 0.0
3 A 2022-04-01 2 1.0
4 A 2022-05-01 3 2.0
5 A 2022-06-01 4 3.0
6 B 2023-03-01 6 0.0
7 B 2023-04-01 6 0.0
8 B 2023-05-01 3 1.0
9 B 2023-06-01 3 1.0
10 C 2021-09-01 1 0.0
11 C 2021-10-01 1 0.0
12 C 2021-11-01 4 1.0
13 C 2021-12-01 4 1.0
14 C 2022-01-01 4 1.0