Assign lables for previous n days based on condition

Advertisements

Hi i have problem with calculating/labelling the dates previous 11 days (irrespective of repeated or missing).
I need to assaign lables in "Day_mark" column for previous 11 days(or n days dynamically) lables when i found ‘1’ in column ‘day’
Below is my dataset and required column is ‘Day_mark’

for ex: in row 18 i found ‘1’ and need to assaign previous 11 days lables.
in row 27 i found ‘1’ and nee to assign previous 11 days lables but we do not have 11 days so limited to day-7

data = {'Date':['2021-10-4','2021-10-7','2021-10-9','2021-10-10','2021-10-11','2021-10-11','2021-10-12',
                '2021-10-12','2021-10-13','2021-10-14','2021-10-15','2021-10-15','2021-10-16','2021-10-16',
                '2021-10-17','2021-10-18','2021-10-19','2021-10-20','2021-10-21','2021-11-1','2021-11-2',
                '2021-11-3','2021-11-3','2021-11-3','2021-11-5','2021-11-6','2021-11-7','2021-11-8','2021-11-9'],
           'Hour':[ 9,11,12,13,5,7,2,20,21,23,1,2,5,7,15,16,17,1,12,13,5,7,2,20,21,23,16,17,13],
'Mark':[ '','','','','','','','','','','','','','','','','',1,'','','','','','','','',1,'',''],
'Day_mark':['','','d11','d10','d9','d9','d8','d8','d7','d6','d5','d5','d4','d4','d3','d2','d1','d7',
            'd6','d5','d4','d3','d3','d3','d2','d1',' ',' ',' ']
}

Thanks in advance

>Solution :

First create groups column by shift mask with compare 1, change order by iloc and add cumulative sum and then remove duplicates by DataFrame.drop_duplicates and add counter by GroupBy.cumcount, then use GroupBy.ffill and set empty string if no match between 1,11 with remove last rows after last 1:

df['g'] = df['Mark'].eq(1).shift(-1).iloc[::-1].cumsum().iloc[::-1]

df['new'] = df.drop_duplicates('Date').groupby('g').cumcount(ascending=False).add(1)

s = df.groupby('g')['new'].ffill().fillna(0).astype(int)
df['new'] = np.where(df['g'].gt(0) & s.between(1,11), 'd' + s.astype(str), '')
df = df.drop('g', axis=1)

print (df)
          Date  Hour Mark Day_mark  new
0    2021-10-4     9                   
1    2021-10-7    11                   
2    2021-10-9    12           d11  d11
3   2021-10-10    13           d10  d10
4   2021-10-11     5            d9   d9
5   2021-10-11     7            d9   d9
6   2021-10-12     2            d8   d8
7   2021-10-12    20            d8   d8
8   2021-10-13    21            d7   d7
9   2021-10-14    23            d6   d6
10  2021-10-15     1            d5   d5
11  2021-10-15     2            d5   d5
12  2021-10-16     5            d4   d4
13  2021-10-16     7            d4   d4
14  2021-10-17    15            d3   d3
15  2021-10-18    16            d2   d2
16  2021-10-19    17            d1   d1
17  2021-10-20     1    1       d7   d7
18  2021-10-21    12            d6   d6
19   2021-11-1    13            d5   d5
20   2021-11-2     5            d4   d4
21   2021-11-3     7            d3   d3
22   2021-11-3     2            d3   d3
23   2021-11-3    20            d3   d3
24   2021-11-5    21            d2   d2
25   2021-11-6    23            d1   d1
26   2021-11-7    16    1              
27   2021-11-8    17                   
28   2021-11-9    13         

Another dea is compare last date per groups and subtract, but output is different:

df['Date'] = pd.to_datetime(df['Date'])

df['g'] = df['Mark'].eq(1).shift(-1).iloc[::-1].cumsum().iloc[::-1]

df['new'] = (df.groupby('g')['Date']
               .transform('last')
               .sub(df['Date'])
               .dt.days
               .add(1)
               .fillna(0)
               .astype(int))

df['new'] = np.where(df['g'].gt(0) & df['new'].le(11), 'd' + df['new'].astype(str), '')
df = df.drop('g', axis=1)

print (df)
         Date  Hour Mark Day_mark  new
0  2021-10-04     9                   
1  2021-10-07    11                   
2  2021-10-09    12           d11  d11
3  2021-10-10    13           d10  d10
4  2021-10-11     5            d9   d9
5  2021-10-11     7            d9   d9
6  2021-10-12     2            d8   d8
7  2021-10-12    20            d8   d8
8  2021-10-13    21            d7   d7
9  2021-10-14    23            d6   d6
10 2021-10-15     1            d5   d5
11 2021-10-15     2            d5   d5
12 2021-10-16     5            d4   d4
13 2021-10-16     7            d4   d4
14 2021-10-17    15            d3   d3
15 2021-10-18    16            d2   d2
16 2021-10-19    17            d1   d1
17 2021-10-20     1    1       d7     
18 2021-10-21    12            d6     
19 2021-11-01    13            d5   d6
20 2021-11-02     5            d4   d5
21 2021-11-03     7            d3   d4
22 2021-11-03     2            d3   d4
23 2021-11-03    20            d3   d4
24 2021-11-05    21            d2   d2
25 2021-11-06    23            d1   d1
26 2021-11-07    16    1              
27 2021-11-08    17                   
28 2021-11-09    13                 

Leave a ReplyCancel reply