I want to calculate the difference between the first and last score for each person according to the dates. Original data looks like this:
ID <- c(1,1,1,2,2,3,3,3,3,4)
Score <- c(3,2,1,1,2,0,0,3,4,0)
Date <- c("2020/01/01","2020/01/02","2020/01/03","2020/02/05","2020/02/06","2021/10/01","2021/10/02","2021/10/03","2021/10/04","2022/03/01")
a <- data.frame(ID,Score,Date)
desired dataset:
diff_first_last <- c(-2,-2,-2,1,1,4,4,4,4,0)
b <- data.frame(ID,Score,Date,diff_first_last)
Is there a way to do this easily because I want to apply it to a much bigger dataset.
I would appreciate all the help there is! Thanks!!!
>Solution :
this is one possible dplyr solution:
library(dplyr)
# build groupings by ID
dplyr::group_by(a, ID) %>%
# convert date from text to date to be able to order by it
dplyr::mutate(Date = lubridate::ymd(Date)) %>%
# order by date just to be sure
dplyr::arrange(Date) %>%
# calculate first last diference
mutate(diff_first_last = dplyr::last(Score) - dplyr::first(Score)) %>%
# ungroup to prevent unwanted behaviour downstream
dplyr::ungroup()
# A tibble: 10 x 4
ID Score Date diff_first_last
<dbl> <dbl> <date> <dbl>
1 1 3 2020-01-01 -2
2 1 2 2020-01-02 -2
3 1 1 2020-01-03 -2
4 2 1 2020-02-05 1
5 2 2 2020-02-06 1
6 3 0 2021-10-01 4
7 3 0 2021-10-02 4
8 3 3 2021-10-03 4
9 3 4 2021-10-04 4
10 4 0 2022-03-01 0