I am dealing with some data as shown below, the exit column marks the exit flag of a customer on MonthEnd date. So, wherever it is 1, the customer has left in that month.
index group date exit
0 A Jan-22 0
1 A Feb-22 0
2 A Mar-22 0
3 A Apr-22 0
5 A May-22 0
6 A Jun-22 0
7 A Jul-22 0
8 A Aug-22 0
9 A Sep-22 1
6 B Mar-21 0
7 B Apr-21 0
8 B May-21 0
9 B Jun-21 0
10 B Jul-21 0
11 B Aug-21 0
12 B Sep-21 0
13 B Oct-21 1
14 C Jan-23 0
15 C Feb-23 0
16 C Mar-23 1
17 D Jul-23 0
18 D Aug-23 1
Now, what I am trying to do is to make a new column "exit_6months", the condition is to mark exit flag 6 months before the actual exit. so the result will be as below.
For C and D group, there are not enough months to go back to 6 months as the customer made exit soon, so all the values will be 1 for them.
I tried using relativedelta but could not acheive this, is there a way to do this? Thank you 🙂
index group date exit exit_6m
0 A Jan-22 0 0
1 A Feb-22 0 0
2 A Mar-22 0 0
3 A Apr-22 0 1
5 A May-22 0 1
6 A Jun-22 0 1
7 A Jul-22 0 1
8 A Aug-22 0 1
9 A Sep-22 1 1
6 B Mar-21 0 0
7 B Apr-21 0 0
8 B May-21 0 1
9 B Jun-21 0 1
10 B Jul-21 0 1
11 B Aug-21 0 1
12 B Sep-21 0 1
13 B Oct-21 1 1
14 C Jan-23 0 1
15 C Feb-23 0 1
16 C Mar-23 1 1
17 D Jul-23 0 1
18 D Aug-23 1 1
>Solution :
Mask the zero’s then backfill the values with a limit for 6 months per group and fill the remaining null values with zeros
df['exit_6M'] = (
df['exit']
.where(df['exit'].eq(1))
.groupby(df['group'])
.bfill(limit=5)
.fillna(0, downcast='infer')
)
index group date exit exit_6M
0 0 A Jan-22 0 0
1 1 A Feb-22 0 0
2 2 A Mar-22 0 0
3 3 A Apr-22 0 1
4 5 A May-22 0 1
5 6 A Jun-22 0 1
6 7 A Jul-22 0 1
7 8 A Aug-22 0 1
8 9 A Sep-22 1 1
9 6 B Mar-21 0 0
10 7 B Apr-21 0 0
11 8 B May-21 0 1
12 9 B Jun-21 0 1
13 10 B Jul-21 0 1
14 11 B Aug-21 0 1
15 12 B Sep-21 0 1
16 13 B Oct-21 1 1
17 14 C Jan-23 0 1
18 15 C Feb-23 0 1
19 16 C Mar-23 1 1
20 17 D Jul-23 0 1
21 18 D Aug-23 1 1