I feel like this should be straightforward but I can’t find an existing answer to my question. I have a df:
df <- data.frame(ID = c('a', 'b', 'c', 'c1', 'd', 'e', 'f', 'g', 'h', 'h1'),
var2 = c(7, 9, 2, 4, 3, 6, 8, 2, 1, 2),
var3 = c(21, 50, 40, 30, 29, 45, 33, 51, 70, 46))
And I’d like to get the mean of every n rows for columns var2 and var3 separately, so that the output looks like this:
var2 var3
1 8.0 35.5
2 3.0 35.0
3 4.5 37.0
4 5.0 42.0
5 1.5 58.0
It would be a bonus if I could keep the first ID of the two rows, e.g:
ID var2 var3
1 a 8.0 35.5
2 c 3.0 35.0
3 d 4.5 37.0
4 f 5.0 42.0
5 h 1.5 58.0
Ty in advance
>Solution :
We need to add a grouping column, and then this is a standard grouped mean:
library(dplyr)
n = 2
df |>
mutate(group = ((row_number() - 1) %/% n) + 1) |>
summarize(
first_id = first(ID),
across(starts_with("var"), mean),
.by = group
)
# group first_id var2 var3
# 1 1 a 8.0 35.5
# 2 2 c 3.0 35.0
# 3 3 d 4.5 37.0
# 4 4 f 5.0 42.0
# 5 5 h 1.5 58.0