Summarize row values based on other column value

February 23, 2023

I’ll illustrate my question with an example.

i have this data:

  individualIndex testPhase total_correct
1                 0        01             7
2                 0        02             7
3                 0        03             6
4                 0        04             5
5                 0        05             9
6                 0        06            10
7                 0        07             5
8                 0        08             9
9                 0        09             6
10                0        10             9
11                1        01             9
12                1        02             9
13                1        03            13
14                1        04            10
15                1        05             8
16                1        06            11
17                1        07            12
18                1        08            10

And I want to summarize each ‘total_correct’ pair of ‘testPhase’ to get this result:

  individualIndex testPhase total_correct
1                 0        01+02          14
2                 0        03+04          11
3                 0        05+06          19
4                 0        07+08          14
....

Unfortunately, could not find a proper solution, any ideas?

>Solution :

We can create a group of every 2 rows for each individualIndex, group_by it and summarise the values. Here is a way to do this via dplyr.

library(dplyr)

df %>%
  mutate(row_index = ceiling(row_number()/2), .by = individualIndex) %>%
  group_by(individualIndex, row_index) %>%
  summarise(testPhase = paste0(testPhase, collapse = '+'), 
            total_correct = sum(total_correct), .groups = "drop") %>%
  select(-row_index)
  
#  individualIndex testPhase total_correct
#            <int> <chr>             <int>
#1               0 1+2                  14
#2               0 3+4                  11
#3               0 5+6                  19
#4               0 7+8                  14
#5               0 9+10                 15
#6               1 1+2                  18
#7               1 3+4                  23
#8               1 5+6                  19
#9               1 7+8                  22

data

df <- structure(list(individualIndex = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 
0L, 0L, 0L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), testPhase = c(1L, 
2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 1L, 2L, 3L, 4L, 5L, 6L, 
7L, 8L), total_correct = c(7L, 7L, 6L, 5L, 9L, 10L, 5L, 9L, 6L, 
9L, 9L, 9L, 13L, 10L, 8L, 11L, 12L, 10L)), class = "data.frame", 
row.names = c(NA, -18L))