Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Calculate overall mean of multiple columns by group

I have several columns with reaction times (RT) per participant. This is a simplified version of the dataframe I have (in reality I have 20 RT columns):

df <- data.frame(id=c(1,1,1,5,5,5), RT_0=c(0.467,NA,0.634,0.416,NA,0.35), RT_1=c(0.3,0.35,0.301,0.267,0.3,0.449), RT_2=c(1.251,NA, 0.334,0.34, 0.317,0.42), RT_3=c(0.284,0.316,0.283,0.367,0.55,0.35),pos=c(1,6,2,4,2,6))

id  RT_0    RT_1    RT_2    RT_3    pos
 1  0.467   0.3     1.251   0.284   1
 1  NA      0.35    NA      0.316   6
 1  0.634   0.301   0.334   0.283   2
 5  0.416   0.267   0.34    0.367   4
 5  NA      0.3     0.317   0.55    2
 5  0.35    0.449   0.42    0.35    6

I want to calculate the overall mean of the RT columns by id. For example, the mean of all reaction times of participant nº1 was 0.452.

Something like this:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

id  RT_0    RT_1    RT_2    RT_3    pos  mean
 1  0.467   0.3     1.251   0.284   1    0.452
 1  NA      0.35    NA      0.316   6    0.452
 1  0.634   0.301   0.334   0.283   2    0.452
 5  0.416   0.267   0.34    0.367   4    0.375
 5  NA      0.3     0.317   0.55    2    0.375
 5  0.35    0.449   0.42    0.35    6    0.375

>Solution :

We may group by ‘id’, select the columns that starts_with ‘RT_’ in column names within across (or pick), unlist the columns to a vector and get the overall mean with mean and assign it as new column in mutate

library(dplyr)
df %>% 
  group_by(id) %>% 
  mutate(mean = mean(unlist(across(starts_with("RT_"))),
        na.rm = TRUE)) %>%
  ungroup

-output

# A tibble: 6 × 7
     id   RT_0  RT_1   RT_2  RT_3   pos  mean
  <dbl>  <dbl> <dbl>  <dbl> <dbl> <dbl> <dbl>
1     1  0.467 0.3    1.25  0.284     1 0.452
2     1 NA     0.35  NA     0.316     6 0.452
3     1  0.634 0.301  0.334 0.283     2 0.452
4     5  0.416 0.267  0.34  0.367     4 0.375
5     5 NA     0.3    0.317 0.55      2 0.375
6     5  0.35  0.449  0.42  0.35      6 0.375

Or with data.table

library(data.table)
setDT(df)[, mean := mean(unlist(.SD), na.rm = TRUE), 
  .(id), .SDcols = patterns("^RT_")]

-output

> df
   id  RT_0  RT_1  RT_2  RT_3 pos      mean
1:  1 0.467 0.300 1.251 0.284   1 0.4520000
2:  1    NA 0.350    NA 0.316   6 0.4520000
3:  1 0.634 0.301 0.334 0.283   2 0.4520000
4:  5 0.416 0.267 0.340 0.367   4 0.3750909
5:  5    NA 0.300 0.317 0.550   2 0.3750909
6:  5 0.350 0.449 0.420 0.350   6 0.3750909
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading