Calculate overall mean of multiple columns by group

January 22, 2023

I have several columns with reaction times (RT) per participant. This is a simplified version of the dataframe I have (in reality I have 20 RT columns):

df <- data.frame(id=c(1,1,1,5,5,5), RT_0=c(0.467,NA,0.634,0.416,NA,0.35), RT_1=c(0.3,0.35,0.301,0.267,0.3,0.449), RT_2=c(1.251,NA, 0.334,0.34, 0.317,0.42), RT_3=c(0.284,0.316,0.283,0.367,0.55,0.35),pos=c(1,6,2,4,2,6))

id  RT_0    RT_1    RT_2    RT_3    pos
 1  0.467   0.3     1.251   0.284   1
 1  NA      0.35    NA      0.316   6
 1  0.634   0.301   0.334   0.283   2
 5  0.416   0.267   0.34    0.367   4
 5  NA      0.3     0.317   0.55    2
 5  0.35    0.449   0.42    0.35    6

I want to calculate the overall mean of the RT columns by id. For example, the mean of all reaction times of participant nº1 was 0.452.

Something like this:

id  RT_0    RT_1    RT_2    RT_3    pos  mean
 1  0.467   0.3     1.251   0.284   1    0.452
 1  NA      0.35    NA      0.316   6    0.452
 1  0.634   0.301   0.334   0.283   2    0.452
 5  0.416   0.267   0.34    0.367   4    0.375
 5  NA      0.3     0.317   0.55    2    0.375
 5  0.35    0.449   0.42    0.35    6    0.375

>Solution :

We may group by ‘id’, select the columns that starts_with ‘RT_’ in column names within across (or pick), unlist the columns to a vector and get the overall mean with mean and assign it as new column in mutate

library(dplyr)
df %>% 
  group_by(id) %>% 
  mutate(mean = mean(unlist(across(starts_with("RT_"))),
        na.rm = TRUE)) %>%
  ungroup

-output

# A tibble: 6 × 7
     id   RT_0  RT_1   RT_2  RT_3   pos  mean
  <dbl>  <dbl> <dbl>  <dbl> <dbl> <dbl> <dbl>
1     1  0.467 0.3    1.25  0.284     1 0.452
2     1 NA     0.35  NA     0.316     6 0.452
3     1  0.634 0.301  0.334 0.283     2 0.452
4     5  0.416 0.267  0.34  0.367     4 0.375
5     5 NA     0.3    0.317 0.55      2 0.375
6     5  0.35  0.449  0.42  0.35      6 0.375

Or with data.table

library(data.table)
setDT(df)[, mean := mean(unlist(.SD), na.rm = TRUE), 
  .(id), .SDcols = patterns("^RT_")]

-output

> df
   id  RT_0  RT_1  RT_2  RT_3 pos      mean
1:  1 0.467 0.300 1.251 0.284   1 0.4520000
2:  1    NA 0.350    NA 0.316   6 0.4520000
3:  1 0.634 0.301 0.334 0.283   2 0.4520000
4:  5 0.416 0.267 0.340 0.367   4 0.3750909
5:  5    NA 0.300 0.317 0.550   2 0.3750909
6:  5 0.350 0.449 0.420 0.350   6 0.3750909