I have some longitudinal data where the obversions of some individuals are missing at some time points:
df <- data.frame(id = c(1, 1, 1, 1, 1, 2, 2, 2), #id2 is missing for time 3 and 5
time = c(1, 2, 3, 4, 5, 1, 2, 4),
value = c(3, 4, 2, 55, 5, 9, 43, 22))
I want to calculate the difference between two individuals if they share the same time point, otherwise we pad them with NA
. The expected outcome is below:
id time value diff
1 1 3 -6 #the time point matches and so calculate the difference
1 2 46 3
1 3 2 NA
1 4 55 33
1 5 5 NA
2 1 9 -6
2 2 43 3
2 3 NA NA
2 4 22 33
2 5 NA NA
How do I achieve this in a fast way? I have a much larger dataset.
>Solution :
Using dplyr
:
df %>%
arrange(id) %>% # make sure id 1 comes first
group_by(time) %>%
mutate(diff = ifelse(n() == 2, -diff(value), NA)) %>%
ungroup()
Gives:
# A tibble: 8 × 4 id time value diff <dbl> <dbl> <dbl> <dbl> 1 1 1 3 -6 2 1 2 4 -39 3 1 3 2 NA 4 1 4 55 33 5 1 5 5 NA 6 2 1 9 -6 7 2 2 43 -39 8 2 4 22 33