Hi i have problem with calculating/labelling the dates previous 11 days (irrespective of repeated or missing).
I need to assaign lables in "Day_mark" column for previous 11 days(or n days dynamically) lables when i found ‘1’ in column ‘day’
Below is my dataset and required column is ‘Day_mark’
for ex: in row 18 i found ‘1’ and need to assaign previous 11 days lables.
in row 27 i found ‘1’ and nee to assign previous 11 days lables but we do not have 11 days so limited to day-7
data = {'Date':['2021-10-4','2021-10-7','2021-10-9','2021-10-10','2021-10-11','2021-10-11','2021-10-12',
'2021-10-12','2021-10-13','2021-10-14','2021-10-15','2021-10-15','2021-10-16','2021-10-16',
'2021-10-17','2021-10-18','2021-10-19','2021-10-20','2021-10-21','2021-11-1','2021-11-2',
'2021-11-3','2021-11-3','2021-11-3','2021-11-5','2021-11-6','2021-11-7','2021-11-8','2021-11-9'],
'Hour':[ 9,11,12,13,5,7,2,20,21,23,1,2,5,7,15,16,17,1,12,13,5,7,2,20,21,23,16,17,13],
'Mark':[ '','','','','','','','','','','','','','','','','',1,'','','','','','','','',1,'',''],
'Day_mark':['','','d11','d10','d9','d9','d8','d8','d7','d6','d5','d5','d4','d4','d3','d2','d1','d7',
'd6','d5','d4','d3','d3','d3','d2','d1',' ',' ',' ']
}
Thanks in advance
>Solution :
First create groups column by shift mask with compare 1
, change order by iloc
and add cumulative sum and then remove duplicates by DataFrame.drop_duplicates
and add counter by GroupBy.cumcount
, then use GroupBy.ffill
and set empty string if no match between 1,11
with remove last rows after last 1
:
df['g'] = df['Mark'].eq(1).shift(-1).iloc[::-1].cumsum().iloc[::-1]
df['new'] = df.drop_duplicates('Date').groupby('g').cumcount(ascending=False).add(1)
s = df.groupby('g')['new'].ffill().fillna(0).astype(int)
df['new'] = np.where(df['g'].gt(0) & s.between(1,11), 'd' + s.astype(str), '')
df = df.drop('g', axis=1)
print (df)
Date Hour Mark Day_mark new
0 2021-10-4 9
1 2021-10-7 11
2 2021-10-9 12 d11 d11
3 2021-10-10 13 d10 d10
4 2021-10-11 5 d9 d9
5 2021-10-11 7 d9 d9
6 2021-10-12 2 d8 d8
7 2021-10-12 20 d8 d8
8 2021-10-13 21 d7 d7
9 2021-10-14 23 d6 d6
10 2021-10-15 1 d5 d5
11 2021-10-15 2 d5 d5
12 2021-10-16 5 d4 d4
13 2021-10-16 7 d4 d4
14 2021-10-17 15 d3 d3
15 2021-10-18 16 d2 d2
16 2021-10-19 17 d1 d1
17 2021-10-20 1 1 d7 d7
18 2021-10-21 12 d6 d6
19 2021-11-1 13 d5 d5
20 2021-11-2 5 d4 d4
21 2021-11-3 7 d3 d3
22 2021-11-3 2 d3 d3
23 2021-11-3 20 d3 d3
24 2021-11-5 21 d2 d2
25 2021-11-6 23 d1 d1
26 2021-11-7 16 1
27 2021-11-8 17
28 2021-11-9 13
Another dea is compare last date per groups and subtract, but output is different:
df['Date'] = pd.to_datetime(df['Date'])
df['g'] = df['Mark'].eq(1).shift(-1).iloc[::-1].cumsum().iloc[::-1]
df['new'] = (df.groupby('g')['Date']
.transform('last')
.sub(df['Date'])
.dt.days
.add(1)
.fillna(0)
.astype(int))
df['new'] = np.where(df['g'].gt(0) & df['new'].le(11), 'd' + df['new'].astype(str), '')
df = df.drop('g', axis=1)
print (df)
Date Hour Mark Day_mark new
0 2021-10-04 9
1 2021-10-07 11
2 2021-10-09 12 d11 d11
3 2021-10-10 13 d10 d10
4 2021-10-11 5 d9 d9
5 2021-10-11 7 d9 d9
6 2021-10-12 2 d8 d8
7 2021-10-12 20 d8 d8
8 2021-10-13 21 d7 d7
9 2021-10-14 23 d6 d6
10 2021-10-15 1 d5 d5
11 2021-10-15 2 d5 d5
12 2021-10-16 5 d4 d4
13 2021-10-16 7 d4 d4
14 2021-10-17 15 d3 d3
15 2021-10-18 16 d2 d2
16 2021-10-19 17 d1 d1
17 2021-10-20 1 1 d7
18 2021-10-21 12 d6
19 2021-11-01 13 d5 d6
20 2021-11-02 5 d4 d5
21 2021-11-03 7 d3 d4
22 2021-11-03 2 d3 d4
23 2021-11-03 20 d3 d4
24 2021-11-05 21 d2 d2
25 2021-11-06 23 d1 d1
26 2021-11-07 16 1
27 2021-11-08 17
28 2021-11-09 13