Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Get mean values per sample, arranged by another column of ID's

Here is a problem that I would like to solve using the tidyverse. It’s difficult to explain but the example lays it out.

I have a dataset with several names. Each name is represented by 1 or more ID’s (ranges from 1-5 in the real data). Each ID has 5 values, ordered by the draw number.

library(tidyverse)

set.seed(190)

dplyr::tibble(nms = c(rep("A", 5), rep("B",10), rep("C", 10)),
              draw = c(rep(1:5, times = 5)),
              id = c(rep(c("A1", "B1", "B2", "C1", "C2"), each = 5)),
              val = c(sample(1:4, replace = T, size = 25))) %>%  print(n=25)
#> # A tibble: 25 × 4
#>    nms    draw id      val
#>    <chr> <int> <chr> <int>
#>  1 A         1 A1        3
#>  2 A         2 A1        4
#>  3 A         3 A1        2
#>  4 A         4 A1        4
#>  5 A         5 A1        1
#>  6 B         1 B1        2
#>  7 B         2 B1        1
#>  8 B         3 B1        3
#>  9 B         4 B1        3
#> 10 B         5 B1        2
#> 11 B         1 B2        1
#> 12 B         2 B2        2
#> 13 B         3 B2        2
#> 14 B         4 B2        1
#> 15 B         5 B2        2
#> 16 C         1 C1        2
#> 17 C         2 C1        1
#> 18 C         3 C1        1
#> 19 C         4 C1        3
#> 20 C         5 C1        4
#> 21 C         1 C2        1
#> 22 C         2 C2        4
#> 23 C         3 C2        4
#> 24 C         4 C2        2
#> 25 C         5 C2        4

Created on 2023-05-31 with reprex v2.0.2

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

I would like to get the average of each draw for each name nms. This means getting the average val of each draw for every individual id. Some names like A only have one id, so the average val, or mean_val, will be the same, but since B and C have multiple id’s, the mean_val will be averaged for each draw. I would like the results to look like this

# A tibble: 15 × 3
#   nms    draw   mean_val
#   <chr> <int> <dbl>
# 1 A         1   3  
# 2 A         2   4  
# 3 A         3   2  
# 4 A         4   4  
# 5 A         5   1  
# 6 B         1   1.5
# 7 B         2   1.5
# 8 B         3   2.5
# 9 B         4   2  
#10 B         5   2  
#11 C         1   1.5
#12 C         2   2.5
#13 C         3   2.5
#14 C         4   2.5
#15 C         5   4 

>Solution :

I think this will give you what you want or at least get you close.

output <- df %>% 
  group_by(nms, draw) %>% 
  summarize(mean_val = mean(val))
# A tibble: 15 × 3
# Groups:   nms [3]
   nms    draw mean_val
   <chr> <int>    <dbl>
 1 A         1      3  
 2 A         2      4  
 3 A         3      2  
 4 A         4      4  
 5 A         5      1  
 6 B         1      1.5
 7 B         2      1.5
 8 B         3      2.5
 9 B         4      2  
10 B         5      2  
11 C         1      1.5
12 C         2      2.5
13 C         3      2.5
14 C         4      2.5
15 C         5      4

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading