I have a dataset that I am trying to copy an index date variable for controls based on their matched case’s index date. In this data, case = 1, control = 0. Each pair has a unique ID in the "matchid" column and time = the timepoint. I have the below sample dataset:
Study_ID time index_date case matchid
<chr> <dbl> <dbl> <dbl> <dbl>
1 101 0 2 1 1
2 101 1 2 1 1
3 101 2 2 1 1
4 101 3 2 1 1
5 340 0 NA 0 1
6 340 1 NA 0 1
7 340 2 NA 0 1
8 340 3 NA 0 1
I need the index_date column for rows 5-8 to be "2" based on "matchid" being the same so it would look like the below:
Study_ID time index_date case matchid
<chr> <dbl> <dbl> <dbl> <dbl>
1 101 0 2 1 1
2 101 1 2 1 1
3 101 2 2 1 1
4 101 3 2 1 1
5 340 0 2 0 1
6 340 1 2 0 1
7 340 2 2 0 1
8 340 3 2 0 1
Any help would be greatly appreciated as the solution for a similar question did not resolve my issue.
I have tried the below Stack Overflow solutions but I am getting errors.
Copy values from one row to another based on condition
r – copy value based on match in another column
>Solution :
Perhaps this?
library(dplyr)
quux %>%
mutate(
index_date = if_else(is.na(index_date), na.omit(index_date)[1], index_date),
.by = c(matchid, time)
)
# Study_ID time index_date case matchid
# 1 101 0 2 1 1
# 2 101 1 2 1 1
# 3 101 2 2 1 1
# 4 101 3 2 1 1
# 5 340 0 2 0 1
# 6 340 1 2 0 1
# 7 340 2 2 0 1
# 8 340 3 2 0 1
(Note: .by= needs dplyr_1.1 or newer; if you have older, pre-use group_by(matchid, time) before the mutate.)
I’m inferring that what we need to do is replace all NA values with the first non-NA found in index_date within each group defined by matchid and time.
Data
quux <- structure(list(Study_ID = c(101L, 101L, 101L, 101L, 340L, 340L, 340L, 340L), time = c(0L, 1L, 2L, 3L, 0L, 1L, 2L, 3L), index_date = c(2L, 2L, 2L, 2L, NA, NA, NA, NA), case = c(1L, 1L, 1L, 1L, 0L, 0L, 0L, 0L), matchid = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L)), class = "data.frame", row.names = c("1", "2", "3", "4", "5", "6", "7", "8"))