See tables have and want below for the sample code and expected output.
Using this data set, I would like SAS code that will output values in a new column. Call the new column "RuleHit". The code should group by ID. The logic for the new column will output "1" if the logic is satisfied and "0" if it is not. The logic will first sum the number of occurrences of "B" in column "RuleHit" for all lagged observations within 8 months based on its "datetime" value. If the sum of those "B" values within the 8-month window is greater or equal to 2, then RuleHit equals "1". If the sum of those values is less than 2, then RuleHit equals 0.
The challenge here is that the number of lags for each observation can change. All my efforts have failed so far. Any help would be appreciated.
INPUT:
data have;
input id datetime :datetime. event $;
format datetime datetime.;
datalines;
1 01JAN2022:00:00:00 A
1 15APR2022:11:46:34 B
1 23JUN2022:01:02:10 A
1 22JAN2023:14:32:58 A
1 08JUN2023:16:10:32 B
1 11JUL2023:20:39:51 A
1 28JAN2023:21:41:35 A
1 09APR2023:11:22:17 A
1 16MAY2023:22:14:23 B
1 13NOV2022:08:08:56 B
2 06FEB2022:09:23:43 B
2 12MAY2022:07:55:09 A
2 25JUN2022:08:26:44 B
2 03OCT2022:09:54:03 A
2 05DEC2022:01:51:06 A
2 12JUL2022:10:14:44 B
2 20SEP2022:16:39:51 A
2 01JAN2023:01:18:57 A
2 28MAR2023:12:46:34 B
2 05JUN2023:07:32:10 A
3 28FEB2022:23:08:58 A
3 02APR2022:05:22:17 B
3 21JUN2022:22:01:43 B
3 10AUG2022:18:36:23 A
3 13NOV2022:20:45:19 A
3 24DEC2022:17:37:57 B
3 18MAR2023:19:41:35 A
3 06MAY2023:15:22:17 A
3 09JUL2023:02:14:23 B
3 21OCT2023:14:08:56 B
;
run;
EXPECTED OUTPUT:
data want;
input id datetime :datetime. event $ RuleHit;
format datetime datetime.;
datalines;
1 01JAN2022:00:00:00 A 0
1 15APR2022:11:46:34 B 0
1 23JUN2022:01:02:10 A 0
1 22JAN2023:14:32:58 A 0
1 08JUN2023:16:10:32 B 0
1 11JUL2023:20:39:51 A 0
1 28JAN2023:21:41:35 A 0
1 09APR2023:11:22:17 A 0
1 16MAY2023:22:14:23 B 1
1 13NOV2022:08:08:56 B 1
2 06FEB2022:09:23:43 B 0
2 12MAY2022:07:55:09 A 0
2 25JUN2022:08:26:44 B 1
2 03OCT2022:09:54:03 A 1
2 05DEC2022:01:51:06 A 0
2 12JUL2022:10:14:44 B 0
2 20SEP2022:16:39:51 A 0
2 01JAN2023:01:18:57 A 0
2 28MAR2023:12:46:34 B 0
2 05JUN2023:07:32:10 A 0
3 28FEB2022:23:08:58 A 0
3 02APR2022:05:22:17 B 0
3 21JUN2022:22:01:43 B 1
3 10AUG2022:18:36:23 A 1
3 13NOV2022:20:45:19 A 1
3 24DEC2022:17:37:57 B 1
3 18MAR2023:19:41:35 A 0
3 06MAY2023:15:22:17 A 0
3 09JUL2023:02:14:23 B 1
3 21OCT2023:14:08:56 B 1
;
run;
>Solution :
Probably better not to think of it as a LAG problem. Instead just do a self-join.
proc sql;
create table want as
select a.*
, sum(b.event='B') as num_hits
, (calculated num_hits > 1) as RuleHit
from have a
left join have b
on a.id = b.id
and b.datetime between intnx('dtmonth',a.datetime,-8,'same') and a.datetime
group by a.id,a.datetime,a.event
;
quit;
Results
Rule
Obs id datetime event num_hits Hit
1 1 01JAN2022:00:00:00 A 0 0
2 1 15APR2022:11:46:34 B 1 0
3 1 23JUN2022:01:02:10 A 1 0
4 1 13NOV2022:08:08:56 B 2 1
5 1 22JAN2023:14:32:58 A 1 0
6 1 28JAN2023:21:41:35 A 1 0
7 1 09APR2023:11:22:17 A 1 0
8 1 16MAY2023:22:14:23 B 2 1
9 1 08JUN2023:16:10:32 B 3 1
10 1 11JUL2023:20:39:51 A 3 1
11 2 06FEB2022:09:23:43 B 1 0
12 2 12MAY2022:07:55:09 A 1 0
13 2 25JUN2022:08:26:44 B 2 1
14 2 12JUL2022:10:14:44 B 3 1
15 2 20SEP2022:16:39:51 A 3 1
16 2 03OCT2022:09:54:03 A 3 1
17 2 05DEC2022:01:51:06 A 2 1
18 2 01JAN2023:01:18:57 A 2 1
19 2 28MAR2023:12:46:34 B 1 0
20 2 05JUN2023:07:32:10 A 1 0
21 3 28FEB2022:23:08:58 A 0 0
22 3 02APR2022:05:22:17 B 1 0
23 3 21JUN2022:22:01:43 B 2 1
24 3 10AUG2022:18:36:23 A 2 1
25 3 13NOV2022:20:45:19 A 2 1
26 3 24DEC2022:17:37:57 B 2 1
27 3 18MAR2023:19:41:35 A 1 0
28 3 06MAY2023:15:22:17 A 1 0
29 3 09JUL2023:02:14:23 B 2 1
30 3 21OCT2023:14:08:56 B 2 1