Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Assign lables for previous n days based on condition

Hi i have problem with calculating/labelling the dates previous 11 days (irrespective of repeated or missing).
I need to assaign lables in "Day_mark" column for previous 11 days(or n days dynamically) lables when i found ‘1’ in column ‘day’
Below is my dataset and required column is ‘Day_mark’

for ex: in row 18 i found ‘1’ and need to assaign previous 11 days lables.
in row 27 i found ‘1’ and nee to assign previous 11 days lables but we do not have 11 days so limited to day-7

data = {'Date':['2021-10-4','2021-10-7','2021-10-9','2021-10-10','2021-10-11','2021-10-11','2021-10-12',
                '2021-10-12','2021-10-13','2021-10-14','2021-10-15','2021-10-15','2021-10-16','2021-10-16',
                '2021-10-17','2021-10-18','2021-10-19','2021-10-20','2021-10-21','2021-11-1','2021-11-2',
                '2021-11-3','2021-11-3','2021-11-3','2021-11-5','2021-11-6','2021-11-7','2021-11-8','2021-11-9'],
           'Hour':[ 9,11,12,13,5,7,2,20,21,23,1,2,5,7,15,16,17,1,12,13,5,7,2,20,21,23,16,17,13],
'Mark':[ '','','','','','','','','','','','','','','','','',1,'','','','','','','','',1,'',''],
'Day_mark':['','','d11','d10','d9','d9','d8','d8','d7','d6','d5','d5','d4','d4','d3','d2','d1','d7',
            'd6','d5','d4','d3','d3','d3','d2','d1',' ',' ',' ']
}

Thanks in advance

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

>Solution :

First create groups column by shift mask with compare 1, change order by iloc and add cumulative sum and then remove duplicates by DataFrame.drop_duplicates and add counter by GroupBy.cumcount, then use GroupBy.ffill and set empty string if no match between 1,11 with remove last rows after last 1:

df['g'] = df['Mark'].eq(1).shift(-1).iloc[::-1].cumsum().iloc[::-1]

df['new'] = df.drop_duplicates('Date').groupby('g').cumcount(ascending=False).add(1)

s = df.groupby('g')['new'].ffill().fillna(0).astype(int)
df['new'] = np.where(df['g'].gt(0) & s.between(1,11), 'd' + s.astype(str), '')
df = df.drop('g', axis=1)

print (df)
          Date  Hour Mark Day_mark  new
0    2021-10-4     9                   
1    2021-10-7    11                   
2    2021-10-9    12           d11  d11
3   2021-10-10    13           d10  d10
4   2021-10-11     5            d9   d9
5   2021-10-11     7            d9   d9
6   2021-10-12     2            d8   d8
7   2021-10-12    20            d8   d8
8   2021-10-13    21            d7   d7
9   2021-10-14    23            d6   d6
10  2021-10-15     1            d5   d5
11  2021-10-15     2            d5   d5
12  2021-10-16     5            d4   d4
13  2021-10-16     7            d4   d4
14  2021-10-17    15            d3   d3
15  2021-10-18    16            d2   d2
16  2021-10-19    17            d1   d1
17  2021-10-20     1    1       d7   d7
18  2021-10-21    12            d6   d6
19   2021-11-1    13            d5   d5
20   2021-11-2     5            d4   d4
21   2021-11-3     7            d3   d3
22   2021-11-3     2            d3   d3
23   2021-11-3    20            d3   d3
24   2021-11-5    21            d2   d2
25   2021-11-6    23            d1   d1
26   2021-11-7    16    1              
27   2021-11-8    17                   
28   2021-11-9    13         

Another dea is compare last date per groups and subtract, but output is different:

df['Date'] = pd.to_datetime(df['Date'])

df['g'] = df['Mark'].eq(1).shift(-1).iloc[::-1].cumsum().iloc[::-1]

df['new'] = (df.groupby('g')['Date']
               .transform('last')
               .sub(df['Date'])
               .dt.days
               .add(1)
               .fillna(0)
               .astype(int))

df['new'] = np.where(df['g'].gt(0) & df['new'].le(11), 'd' + df['new'].astype(str), '')
df = df.drop('g', axis=1)

print (df)
         Date  Hour Mark Day_mark  new
0  2021-10-04     9                   
1  2021-10-07    11                   
2  2021-10-09    12           d11  d11
3  2021-10-10    13           d10  d10
4  2021-10-11     5            d9   d9
5  2021-10-11     7            d9   d9
6  2021-10-12     2            d8   d8
7  2021-10-12    20            d8   d8
8  2021-10-13    21            d7   d7
9  2021-10-14    23            d6   d6
10 2021-10-15     1            d5   d5
11 2021-10-15     2            d5   d5
12 2021-10-16     5            d4   d4
13 2021-10-16     7            d4   d4
14 2021-10-17    15            d3   d3
15 2021-10-18    16            d2   d2
16 2021-10-19    17            d1   d1
17 2021-10-20     1    1       d7     
18 2021-10-21    12            d6     
19 2021-11-01    13            d5   d6
20 2021-11-02     5            d4   d5
21 2021-11-03     7            d3   d4
22 2021-11-03     2            d3   d4
23 2021-11-03    20            d3   d4
24 2021-11-05    21            d2   d2
25 2021-11-06    23            d1   d1
26 2021-11-07    16    1              
27 2021-11-08    17                   
28 2021-11-09    13                 
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading