I`m struggling on how can I take the measure of 4 rows sequentially for multiples variables in my dataframe sorting by the date. Here is a toy example
test = data.frame(my_groups = c("A", "A", "A", "B", "B", "C", "C", "C", "A", "A","A","A","A","A" , "C"),
measure1 = c(10:24),
measure2 = c(1:15),
time= c("20-09-2020", "25-09-2020", "19-09-2020", "20-05-2020", "20-06-2021",
"11-01-2021", "13-01-2021", "13-01-2021", "15-01-2021", "15-01-2021",
"20-03-2021", "20-10-2021", "29-06-2021", "20-07-2021", "13-06-2021"))
# my_groups measure1 measure2 time
# 1 A 18 9 15-01-2021
# 2 A 19 10 15-01-2021
# 3 A 12 3 19-09-2020
# 4 A 20 11 20-03-2021
# 5 A 23 14 20-07-2021
# 6 A 10 1 20-09-2020
# 7 A 21 12 20-10-2021
# 8 A 11 2 25-09-2020
# 9 A 22 13 29-06-2021
# 10 B 13 4 20-05-2020
# 11 B 14 5 20-06-2021
# 12 C 15 6 11-01-2021
# 13 C 16 7 13-01-2021
# 14 C 17 8 13-01-2021
# 15 C 24 15 13-06-2021
As result I’d like something like:
# my_groups measure1 measure2 time
# 1 A 17.25 8.25 20-03-2021 #mean for the first 4 elements of A and maximun date
# 2 A 16.25 7.25 25-09-2020 #mean for the others 4 elements of A and maximun date
# 3 C 18 9 13-06-2021 #mean for the first 4 elements of B and maximun date
To get this result I thought of using something like this:
test %>%
arrange( my_groups,time) %>%
group_by(my_groups) %>%
summarise(measure1 = mean(measure1),
measure2 = mean(measure2),
time = max(time))
But I’m having problem to find a solution to take these measures for 4 values consecutively.
Any hint on how can I do that?
>Solution :
We can add another grouping variable g4 using integer division to ensure that we get groups of 4, and if desired, drop groups with fewer members. Then just run your summarize.
library(dplyr)
test %>%
arrange(my_groups, time) %>%
group_by(my_groups) %>%
mutate(g4 = (row_number() - 1) %/% 4) %>%
group_by(my_groups, g4) %>%
filter(n() == 4) %>%
summarise(measure1 = mean(measure1),
measure2 = mean(measure2),
time = max(time),
.groups = "drop")
#> # A tibble: 3 × 5
#> my_groups g4 measure1 measure2 time
#> <chr> <dbl> <dbl> <dbl> <chr>
#> 1 A 0 17.2 8.25 20-03-2021
#> 2 A 1 16.2 7.25 25-09-2020
#> 3 C 0 18 9 13-06-2021