Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Efficiently create summary table with min, max, mean, SE and sample size by category

I have a dataset that shows the walking speed of different age classes, measured in two different periods. Here is a reproducible example:

Period<-rep(c(1,2),times=c(9,18))
Class<-rep(c("child","teen","adult","toddler","child","teen","adult"),times=c(2,3,4,3,4,5,6))
Speed<-c(2,3,3,4,5,5,4,4,6,1,0.7,0.3,1,2,3,2,2,5,5,4,2,3,5,4,6,5,4)
data<-data.frame(Period,Class,Speed)

   Period   Class Speed
1       1   child   2.0
2       1   child   3.0
3       1    teen   3.0
4       1    teen   4.0
5       1    teen   5.0
6       1   adult   5.0
7       1   adult   4.0
8       1   adult   4.0
9       1   adult   6.0
10      2 toddler   1.0
11      2 toddler   0.7
12      2 toddler   0.3
13      2   child   1.0
14      2   child   2.0
15      2   child   3.0
16      2   child   2.0
17      2    teen   2.0
18      2    teen   5.0
19      2    teen   5.0
20      2    teen   4.0
21      2    teen   2.0
22      2   adult   3.0
23      2   adult   5.0
24      2   adult   4.0
25      2   adult   6.0
26      2   adult   5.0
27      2   adult   4.0

I want to create a summary table with the following attributes:

  1. The minimum, maximum and average speed by age class and by period
  2. The standard error of the speed by age class by period
  3. The number of individuals per class by period

I would also like to add three "total" rows:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

  1. minimum, maximum, average speeds, standard errors and number of all individuals in Period 1
  2. The same for period 2
  3. The same over both periods

The resulting summary table should look like this:

 Class         Period  mean   min   max    SE     n
   <chr>          <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
 1 adult              1 4.75    4       6 0.479     4
 2 adult              2 4.5     3       6 0.428     2
 3 child              1 2.5     2       3 0.5       3
 4 child              2 2       1       3 0.408     6
 5 teen               1 4       3       5 0.577     4
 6 teen               2 3.6     2       5 0.678     5
 7 toddler            2 0.667   0.3     1 0.203     3
 8 Total.Period2      2 4       2       6 0.408     9
 9 Total.Period1      1 3.06    0.3     6 0.414    18
10 Total             NA 3.37    0.3     6 0.315    27

I can currently do this using functions from the tidyverse and rbind() in R. I’ve also been using the function std.error from the package plotrix() to calculate the standard errors. However, this is quite a lengthy process and I’d like to make it more efficient. Any suggestions?

>Solution :

Try this to calculate the summary:

library(tidyverse)
rbind(data %>% group_by(Class, Period) %>% summarise(mean = mean(Speed), min = min(Speed), max = max(Speed), SE = sd(Speed), n = n()) %>% ungroup(),
      cbind("Class" = c("Period.1","Period.2"),data %>% group_by(Period) %>% summarise(mean = mean(Speed), min = min(Speed), max = max(Speed), SE = sd(Speed), n = n()) %>% ungroup()),
      cbind("Class" = "Total", "Period" = NA,data %>% summarise(mean = mean(Speed), min = min(Speed), max = max(Speed), SE = sd(Speed), n = n()) %>% ungroup()))


As far as I know, this is the fastest way to do it

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading