delete line under condition R dplyr

May 30, 2024

I have a df like this one:

ID	FOLLOW-UP	RECURRENCE	RECURRENCE DATE
15	1	no
15	2	yes	2003-05-15
16	1	no
29	1	yes	2002-06-10
29	2	yes	2002-12-16
34	1	yes	2009-11-15
34	2	no
34	3	yes	2012-05-03
36	1	no
36	2	no

With in the first column the patient identifier (ID), in the 2nd the follow-up number, in the 3rd column the occurrence of a recurrence (yes/no) and in the last column the date of recurrence (if recurrence = yes).

There are therefore repeated lines for several patients, if they have several follow-ups.

I’d like to obtain a df with only 1 line per patient.

If the patient has had several recurrences, I’d like to keep the line with the oldest recurrence (date of first recurrence).
If the patient has had several follow-ups with recurrence and without recurrence, I’d like to keep the line with the (oldest) recurrence.
If the patient has several follow-ups without recurrence, I’d like to keep follow-up 1.

In the end, I’d like to get this df here:

ID	FOLLOW_UP	RECURRENCE	RECURRENCE_DATE
15	2	yes	2003-05-15
16	1	no
29	1	yes	2002-06-10
34	1	yes	2009-11-15
36	1	no

Here’s a reproducible example :

df <- data.frame(ID = c("15", "15", "16","29", "29", "34","34", "34", "36","36"),
                 FOLLOW_UP = c("1", "2", "1","1", "2", "1","2", "3", "1","2"),
                 RECURRENCE = c("no", "yes", "no","yes", "yes", "yes","no", "yes", "no","no"),
                 RECURRENCE_DATE = c("", "2003-05-15", "","2002-06-10", "2002-12-16", "2009-11-15","", "2012-05-03", "","")
)

I thought of using the “group by” and “slice” functions but I don’t know how to apply them with conditions.

Thanks in advance

>Solution :

something like this:

df$RECURRENCE_DATE <- as.Date(df$RECURRENCE_DATE, format = "%Y-%m-%d")
df_filtered <- df %>%
  group_by(ID) %>%
  arrange(desc(RECURRENCE), RECURRENCE_DATE, FOLLOW_UP) %>%  
  slice_head(n = 1) %>%
  ungroup()