Python – Create a new column in the DataFrame based on some conditions

August 15, 2023

I am dealing with some data as shown below, the exit column marks the exit flag of a customer on MonthEnd date. So, wherever it is 1, the customer has left in that month.

index    group      date    exit 
  0        A       Jan-22     0  
  1        A       Feb-22     0 
  2        A       Mar-22     0 
  3        A       Apr-22     0 
  5        A       May-22     0 
  6        A       Jun-22     0 
  7        A       Jul-22     0 
  8        A       Aug-22     0 
  9        A       Sep-22     1 
  6        B       Mar-21     0 
  7        B       Apr-21     0 
  8        B       May-21     0 
  9        B       Jun-21     0 
  10       B       Jul-21     0 
  11       B       Aug-21     0 
  12       B       Sep-21     0 
  13       B       Oct-21     1 
  14       C       Jan-23     0 
  15       C       Feb-23     0 
  16       C       Mar-23     1 
  17       D       Jul-23     0 
  18       D       Aug-23     1

Now, what I am trying to do is to make a new column "exit_6months", the condition is to mark exit flag 6 months before the actual exit. so the result will be as below.

For C and D group, there are not enough months to go back to 6 months as the customer made exit soon, so all the values will be 1 for them.

I tried using relativedelta but could not acheive this, is there a way to do this? Thank you 🙂

index    group      date    exit   exit_6m  
  0        A       Jan-22     0      0 
  1        A       Feb-22     0      0
  2        A       Mar-22     0      0
  3        A       Apr-22     0      1
  5        A       May-22     0      1
  6        A       Jun-22     0      1
  7        A       Jul-22     0      1
  8        A       Aug-22     0      1
  9        A       Sep-22     1      1
  6        B       Mar-21     0      0
  7        B       Apr-21     0      0
  8        B       May-21     0      1
  9        B       Jun-21     0      1
  10       B       Jul-21     0      1
  11       B       Aug-21     0      1
  12       B       Sep-21     0      1
  13       B       Oct-21     1      1
  14       C       Jan-23     0      1
  15       C       Feb-23     0      1
  16       C       Mar-23     1      1
  17       D       Jul-23     0      1
  18       D       Aug-23     1      1

>Solution :

Mask the zero’s then backfill the values with a limit for 6 months per group and fill the remaining null values with zeros

df['exit_6M'] = (
    df['exit']
    .where(df['exit'].eq(1))
    .groupby(df['group'])
    .bfill(limit=5)
    .fillna(0, downcast='infer')
)

    index group    date  exit  exit_6M
0       0     A  Jan-22     0        0
1       1     A  Feb-22     0        0
2       2     A  Mar-22     0        0
3       3     A  Apr-22     0        1
4       5     A  May-22     0        1
5       6     A  Jun-22     0        1
6       7     A  Jul-22     0        1
7       8     A  Aug-22     0        1
8       9     A  Sep-22     1        1
9       6     B  Mar-21     0        0
10      7     B  Apr-21     0        0
11      8     B  May-21     0        1
12      9     B  Jun-21     0        1
13     10     B  Jul-21     0        1
14     11     B  Aug-21     0        1
15     12     B  Sep-21     0        1
16     13     B  Oct-21     1        1
17     14     C  Jan-23     0        1
18     15     C  Feb-23     0        1
19     16     C  Mar-23     1        1
20     17     D  Jul-23     0        1
21     18     D  Aug-23     1        1