Copy values to rows based on conditions

July 13, 2023

I have a dataset that I am trying to copy an index date variable for controls based on their matched case’s index date. In this data, case = 1, control = 0. Each pair has a unique ID in the "matchid" column and time = the timepoint. I have the below sample dataset:

  Study_ID  time index_date  case matchid
   <chr>    <dbl>      <dbl> <dbl> <dbl>
 1 101        0          2     1     1
 2 101        1          2     1     1
 3 101        2          2     1     1
 4 101        3          2     1     1
 5 340        0          NA    0     1
 6 340        1          NA    0     1
 7 340        2          NA    0     1
 8 340        3          NA    0     1

I need the index_date column for rows 5-8 to be "2" based on "matchid" being the same so it would look like the below:

  Study_ID  time index_date  case matchid
   <chr>    <dbl>      <dbl> <dbl> <dbl>
 1 101        0          2     1     1
 2 101        1          2     1     1
 3 101        2          2     1     1
 4 101        3          2     1     1
 5 340        0          2     0     1
 6 340        1          2     0     1
 7 340        2          2     0     1
 8 340        3          2     0     1

Any help would be greatly appreciated as the solution for a similar question did not resolve my issue.

I have tried the below Stack Overflow solutions but I am getting errors.

Copy values from one row to another based on condition

r – copy value based on match in another column

>Solution :

Perhaps this?

library(dplyr)
quux %>%
  mutate(
    index_date = if_else(is.na(index_date), na.omit(index_date)[1], index_date),
    .by = c(matchid, time)
  )
#   Study_ID time index_date case matchid
# 1      101    0          2    1       1
# 2      101    1          2    1       1
# 3      101    2          2    1       1
# 4      101    3          2    1       1
# 5      340    0          2    0       1
# 6      340    1          2    0       1
# 7      340    2          2    0       1
# 8      340    3          2    0       1

(Note: .by= needs dplyr_1.1 or newer; if you have older, pre-use group_by(matchid, time) before the mutate.)

I’m inferring that what we need to do is replace all NA values with the first non-NA found in index_date within each group defined by matchid and time.

Data

quux <- structure(list(Study_ID = c(101L, 101L, 101L, 101L, 340L, 340L, 340L, 340L), time = c(0L, 1L, 2L, 3L, 0L, 1L, 2L, 3L), index_date = c(2L, 2L, 2L, 2L, NA, NA, NA, NA), case = c(1L, 1L, 1L, 1L, 0L, 0L, 0L, 0L), matchid = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L)), class = "data.frame", row.names = c("1", "2", "3", "4", "5", "6", "7", "8"))