Calculate the difference if the values in one column match

I have some longitudinal data where the obversions of some individuals are missing at some time points:

df <- data.frame(id = c(1, 1, 1, 1, 1, 2, 2, 2), #id2 is missing for time 3 and 5
                 time = c(1, 2, 3, 4, 5, 1, 2, 4),
                 value = c(3, 4, 2, 55, 5, 9, 43, 22))

I want to calculate the difference between two individuals if they share the same time point, otherwise we pad them with NA. The expected outcome is below:

 id  time value  diff
  1    1     3    -6  #the time point matches and so calculate the difference
  1    2    46     3  
  1    3     2    NA
  1    4    55    33
  1    5     5    NA
  2    1     9    -6
  2    2    43     3
  2    3    NA    NA
  2    4    22    33
  2    5    NA    NA

How do I achieve this in a fast way? I have a much larger dataset.

>Solution :

Using dplyr:

df %>% 
  arrange(id) %>%                                    # make sure id 1 comes first
  group_by(time) %>% 
  mutate(diff = ifelse(n() == 2, -diff(value), NA)) %>%
  ungroup()

Gives:

# A tibble: 8 × 4
     id  time value  diff
  <dbl> <dbl> <dbl> <dbl>
1     1     1     3    -6
2     1     2     4   -39
3     1     3     2    NA
4     1     4    55    33
5     1     5     5    NA
6     2     1     9    -6
7     2     2    43   -39
8     2     4    22    33

Leave a Reply