Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

delete line under condition R dplyr

I have a df like this one:

ID FOLLOW-UP RECURRENCE RECURRENCE DATE
15 1 no
15 2 yes 2003-05-15
16 1 no
29 1 yes 2002-06-10
29 2 yes 2002-12-16
34 1 yes 2009-11-15
34 2 no
34 3 yes 2012-05-03
36 1 no
36 2 no

With in the first column the patient identifier (ID), in the 2nd the follow-up number, in the 3rd column the occurrence of a recurrence (yes/no) and in the last column the date of recurrence (if recurrence = yes).

There are therefore repeated lines for several patients, if they have several follow-ups.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

I’d like to obtain a df with only 1 line per patient.

  • If the patient has had several recurrences, I’d like to keep the line with the oldest recurrence (date of first recurrence).
  • If the patient has had several follow-ups with recurrence and without recurrence, I’d like to keep the line with the (oldest) recurrence.
  • If the patient has several follow-ups without recurrence, I’d like to keep follow-up 1.

In the end, I’d like to get this df here:

ID FOLLOW_UP RECURRENCE RECURRENCE_DATE
15 2 yes 2003-05-15
16 1 no
29 1 yes 2002-06-10
34 1 yes 2009-11-15
36 1 no

Here’s a reproducible example :

df <- data.frame(ID = c("15", "15", "16","29", "29", "34","34", "34", "36","36"),
                 FOLLOW_UP = c("1", "2", "1","1", "2", "1","2", "3", "1","2"),
                 RECURRENCE = c("no", "yes", "no","yes", "yes", "yes","no", "yes", "no","no"),
                 RECURRENCE_DATE = c("", "2003-05-15", "","2002-06-10", "2002-12-16", "2009-11-15","", "2012-05-03", "","")
)

I thought of using the “group by” and “slice” functions but I don’t know how to apply them with conditions.

Thanks in advance

>Solution :

something like this:

df$RECURRENCE_DATE <- as.Date(df$RECURRENCE_DATE, format = "%Y-%m-%d")
df_filtered <- df %>%
  group_by(ID) %>%
  arrange(desc(RECURRENCE), RECURRENCE_DATE, FOLLOW_UP) %>%  
  slice_head(n = 1) %>%
  ungroup()
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading