Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Summarizing over groups within larger groups using dplyr

I am working with a dataset with repeated observations within each treatment, and I need to find the mean value within each of the treatments. I would like to use dplyr. My data is as follows:

ex <- data.frame(plot = c(101,102,103,104,105,106,201,202,203,204,205,206,301,302,303,304,305,306),
                 trt = c("a","a","b","b","c","c","a","a","b","b","c","c","a","a","b","b","c","c"),
                 value = c(1,2,3,4,5,6,1,2,3,4,5,6,1,2,3,4,5,6))

The summarised dataset needs to be the mean of each treatment within the sub-groups. Here is what it should look like when completed:

correct <- data.frame(trt = c("a","b","c","a","b","c","a","b","c"),
                       rating = c(1.5,3.5,5.5,1.5,3.5,5.5,1.5,3.5,5.5))

Here is what I have tried:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

library(dplyr)

example <- ex %>% 
  dplyr::select(plot, trt, value) %>% 
  group_by(trt) %>% 
  summarise(rating = mean(value), .groups = 'drop')

However, the following is produced:

incorrect_example <- data.frame(trt=c("a","b","c"),
                                rating=c(1.5,3.5,5.5))

How can I produce results like those indicated in the correct dataframe?

>Solution :

We may need rleid

library(dplyr)
library(data.table)
ex %>%  
  group_by(grp = rleid(trt), trt) %>% 
  summarise(rating = mean(value), .groups = 'drop') %>%
  select(-grp)

-output

# A tibble: 9 × 2
  trt   rating
  <chr>  <dbl>
1 a        1.5
2 b        3.5
3 c        5.5
4 a        1.5
5 b        3.5
6 c        5.5
7 a        1.5
8 b        3.5
9 c        5.5

Or may also use %/% on the ‘plot’ to create a grouping column along with ‘trt’ as group, then get the mean of the ‘value’ column

ex %>% 
  group_by(grp = plot %/% 100, trt) %>% 
  summarise(value = mean(value), .groups = 'drop') %>% 
  select(-grp)

-output

# A tibble: 9 × 2
  trt   value
  <chr> <dbl>
1 a       1.5
2 b       3.5
3 c       5.5
4 a       1.5
5 b       3.5
6 c       5.5
7 a       1.5
8 b       3.5
9 c       5.5
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading