Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

How to set values as NA with complex if-else criteria in R?

My df looks like this:

SERIAL  quest  time_d1_1  time_d1_2  time_d2_1  time_d2_2  STARTED              V01
F3L     d1_1   05:00      17:30      05:15      17:45      2022-01-08 05:06:19  5
F3L     d1_2   05:00      17:30      05:15      17:45      2022-01-08 17:30:07  2
F3L     d2_1   05:00      17:30      05:15      17:45      2022-01-08 8:36:54   1
F3L     d2_2   05:00      17:30      05:15      17:45      2022-01-08 18:10:07  7
7HG     d1_1   05:00      17:30      05:15      17:45      2022-01-08 05:33:15  4
7HG     d1_2   05:00      17:30      05:15      17:45      2022-01-08 18:49:22  2
7HG     d2_1   05:00      17:30      05:15      17:45      2022-01-08 07:33:15  2
7HG     d2_2   05:00      17:30      05:15      17:45      2022-01-08 18:29:22  6
  1. SERIAL = Identifier
  2. quest = [day X] _ [measurement]; "d1_1" = day one, measurement one
  3. time_d1_1 = the reference time (hh:mm) for day one, measurement 1
  4. time_d2_2 = the reference time (hh:mm) for day two, measurement 2
  5. STARTED = the day & time (yyyy:mm:dd hh:mm:ss) when each measurement was started
  6. V01 = some values belonging to each quest

For each row, I would like to set the variable V01 to NA, when the variable STARTED is more than an hour later than the reference time (time_d1_1 to time_d2_2), in relation to the quest variable.

Example: In row 3, SERIAL=F3L started day two, measurement one (quest=d2_1) at 8:36:54. However, the reference time (time_d2_1) is 05:15. I would now set V01=1 to NA, as 8:36:54 is larger than 05:15 + hour(1).

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

Unfortunately, I have this weird format so I struggle to solve this with the mutate-ifelse() or mutate-case_when() functions. Can anyone help, preferably a tidyverse solution?

The data:

Dat <- structure(list(SERIAL = c("F3L","F3L","F3L","F3L","7HG","7HG","7HG","7HG"),
                      quest = c("d1_1","d1_2","d2_1","d2_2","d1_1","d1_2","d2_1","d2_2"),
                      time_d1_1 = c("05:00","05:00","05:00","05:00","05:30","05:30","05:30","05:30"),
                      time_d1_2 = c("17:30","17:30","17:30","17:30","18:10","18:10","18:10","18:10"),
                      time_d2_1 = c("05:15","05:15","05:15","05:15","05:30","05:30","05:30","05:30"),
                      time_d2_2 = c("17:45","17:45","17:45","17:45","18:00","18:00","18:00","18:00"),
                      STARTED = c("2022-01-08 05:06:19","2022-01-08 17:30:07","2022-01-09 8:36:54",
                                  "2022-01-09 18:10:07","2021-09-04 05:33:15","2021-09-04 18:49:22",
                                  "2021-09-05 07:33:15","2021-09-05 18:29:22"),
                      V01 = c(5,3,1,7,4,2,2,6)),
                 class = "data.frame",
                 row.names = c(NA, -8L))

>Solution :

We can turn the table into long form with tidyr::pivot_longer, do the calculation, and turn into wide form (with tidyr::pivot_wider) again. See inline comments in example. I used lubridate to parse the datetime object; this could probably also be done using base R.

library(dplyr)
library(tidyr)
library(lubridate)

Dat %>% 
  pivot_longer(starts_with("time_"), names_prefix = "time_") %>% # turn into long form
  filter(quest == name) %>% # keep only the record where the `time_dX_Y` column matches `quest` 
  mutate(in_time = value > strftime(as_datetime(STARTED) - hours(1), "%H:%M:%S", tz = "UTC"),) %>% # calculate whether was in time
  pivot_wider(names_from = name, names_prefix = "time_", values_from = value) %>% # turn into wide form again
  group_by(SERIAL) %>%
  mutate(across(starts_with("time_"), function(x) first(x[!is.na(x)]))) # fill missings in `time_dX_Y` columns caused by the filter above

Which gives

  SERIAL quest STARTED               V01 in_time time_d1_1 time_d1_2 time_d2_1 time_d2_2
  <chr>  <chr> <chr>               <dbl> <lgl>   <chr>     <chr>     <chr>     <chr>    
1 F3L    d1_1  2022-01-08 05:06:19     5 TRUE    05:00     17:30     05:15     17:45    
2 F3L    d1_2  2022-01-08 17:30:07     3 TRUE    05:00     17:30     05:15     17:45    
3 F3L    d2_1  2022-01-09 8:36:54      1 FALSE   05:00     17:30     05:15     17:45    
4 F3L    d2_2  2022-01-09 18:10:07     7 TRUE    05:00     17:30     05:15     17:45    
5 7HG    d1_1  2021-09-04 05:33:15     4 TRUE    05:30     18:10     05:30     18:00    
6 7HG    d1_2  2021-09-04 18:49:22     2 TRUE    05:30     18:10     05:30     18:00    
7 7HG    d2_1  2021-09-05 07:33:15     2 FALSE   05:30     18:10     05:30     18:00    
8 7HG    d2_2  2021-09-05 18:29:22     6 TRUE    05:30     18:10     05:30     18:00    
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading