The following data is given and sorted by id and date.
I would like to find out how many days lie between data entries of class A and of B within an id creating a new variable called ‘days_diff’. The goal is to be able to calculate if the difference in days exceeds a predefined threshold.
HAVE
id date class
1 01/01/2022 A
1 06/02/2022 B
1 28/02/2022 A
1 08/10/2022 B
2 01/01/2022 B
2 06/02/2022 B
2 28/02/2022 A
2 08/10/2022 A
WANT
id date class days_diff
1 01/01/2022 A 0
1 06/02/2022 B 0
1 28/02/2022 A 58
1 08/10/2022 B 244
2 01/01/2022 B 0
2 06/02/2022 B 36
2 28/02/2022 A 0
2 08/10/2022 A 222
I am aware of the intck function to calculate the difference in days, but don’t know how to include this function into a loop.
data _null_;
days_diff = intck('day', '01JAN2022'd, '28FEB2022'd);
put days_diff;
run;
Thanks in advance.
PP
>Solution :
There is no need to use INTCK() when the interval you want is the basic storage unit of the data. Since DATE values are stored in days you can use subtraction to calculate differences in days.
To find the difference in values of variable between observations just use the DIF() function (related to the LAG() function). You have to call it for every observation, otherwise it cannot know what the previous value was, so you will need to use BY group processing to know when to force it to zero.
It sounds like you have the data sorted in the wrong order for what you want to calculate. So first sort the data properly and then it should work.
data have;
input id date :ddmmyy. class $;
format date yymmdd10.;
cards;
1 01/01/2022 A
1 06/02/2022 B
1 28/02/2022 A
1 08/10/2022 B
2 01/01/2022 B
2 06/02/2022 B
2 28/02/2022 A
2 08/10/2022 A
;
proc sort data=have out=want;
by id class date;
run;
data want;
set want;
by id class date;
days_diff = dif(date);
if first.class then days_diff=0;
run;
Result:
PS To avoid confusing half of your audience do not display dates in MDY or DMY order. Use YMD order or DATE format.