I’ll illustrate my question with an example.
i have this data:
individualIndex testPhase total_correct
1 0 01 7
2 0 02 7
3 0 03 6
4 0 04 5
5 0 05 9
6 0 06 10
7 0 07 5
8 0 08 9
9 0 09 6
10 0 10 9
11 1 01 9
12 1 02 9
13 1 03 13
14 1 04 10
15 1 05 8
16 1 06 11
17 1 07 12
18 1 08 10
And I want to summarize each ‘total_correct’ pair of ‘testPhase’ to get this result:
individualIndex testPhase total_correct
1 0 01+02 14
2 0 03+04 11
3 0 05+06 19
4 0 07+08 14
....
Unfortunately, could not find a proper solution, any ideas?
>Solution :
We can create a group of every 2 rows for each individualIndex
, group_by
it and summarise the values. Here is a way to do this via dplyr
.
library(dplyr)
df %>%
mutate(row_index = ceiling(row_number()/2), .by = individualIndex) %>%
group_by(individualIndex, row_index) %>%
summarise(testPhase = paste0(testPhase, collapse = '+'),
total_correct = sum(total_correct), .groups = "drop") %>%
select(-row_index)
# individualIndex testPhase total_correct
# <int> <chr> <int>
#1 0 1+2 14
#2 0 3+4 11
#3 0 5+6 19
#4 0 7+8 14
#5 0 9+10 15
#6 1 1+2 18
#7 1 3+4 23
#8 1 5+6 19
#9 1 7+8 22
data
df <- structure(list(individualIndex = c(0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), testPhase = c(1L,
2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 1L, 2L, 3L, 4L, 5L, 6L,
7L, 8L), total_correct = c(7L, 7L, 6L, 5L, 9L, 10L, 5L, 9L, 6L,
9L, 9L, 9L, 13L, 10L, 8L, 11L, 12L, 10L)), class = "data.frame",
row.names = c(NA, -18L))