Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Copy values to rows based on conditions

I have a dataset that I am trying to copy an index date variable for controls based on their matched case’s index date. In this data, case = 1, control = 0. Each pair has a unique ID in the "matchid" column and time = the timepoint. I have the below sample dataset:

  Study_ID  time index_date  case matchid
   <chr>    <dbl>      <dbl> <dbl> <dbl>
 1 101        0          2     1     1
 2 101        1          2     1     1
 3 101        2          2     1     1
 4 101        3          2     1     1
 5 340        0          NA    0     1
 6 340        1          NA    0     1
 7 340        2          NA    0     1
 8 340        3          NA    0     1

I need the index_date column for rows 5-8 to be "2" based on "matchid" being the same so it would look like the below:

  Study_ID  time index_date  case matchid
   <chr>    <dbl>      <dbl> <dbl> <dbl>
 1 101        0          2     1     1
 2 101        1          2     1     1
 3 101        2          2     1     1
 4 101        3          2     1     1
 5 340        0          2     0     1
 6 340        1          2     0     1
 7 340        2          2     0     1
 8 340        3          2     0     1

Any help would be greatly appreciated as the solution for a similar question did not resolve my issue.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

I have tried the below Stack Overflow solutions but I am getting errors.

Copy values from one row to another based on condition

r – copy value based on match in another column

>Solution :

Perhaps this?

library(dplyr)
quux %>%
  mutate(
    index_date = if_else(is.na(index_date), na.omit(index_date)[1], index_date),
    .by = c(matchid, time)
  )
#   Study_ID time index_date case matchid
# 1      101    0          2    1       1
# 2      101    1          2    1       1
# 3      101    2          2    1       1
# 4      101    3          2    1       1
# 5      340    0          2    0       1
# 6      340    1          2    0       1
# 7      340    2          2    0       1
# 8      340    3          2    0       1

(Note: .by= needs dplyr_1.1 or newer; if you have older, pre-use group_by(matchid, time) before the mutate.)

I’m inferring that what we need to do is replace all NA values with the first non-NA found in index_date within each group defined by matchid and time.


Data

quux <- structure(list(Study_ID = c(101L, 101L, 101L, 101L, 340L, 340L, 340L, 340L), time = c(0L, 1L, 2L, 3L, 0L, 1L, 2L, 3L), index_date = c(2L, 2L, 2L, 2L, NA, NA, NA, NA), case = c(1L, 1L, 1L, 1L, 0L, 0L, 0L, 0L), matchid = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L)), class = "data.frame", row.names = c("1", "2", "3", "4", "5", "6", "7", "8"))
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading