ggplot: stat_summary for mean with facet

Within ggplot2, I am using the stat_summary() function to calculate and plot the mean and standard deviation of a dataset. I am simultaneously using facet_wrap() to break the dataset into two plots.
I was pleasantly surprised that adding facet_wrap() to my ggplot caused stat_summary() to correctly be applied to each subset of the data independently.

df>
| ID        | Group | Strain | Condition | DoublingTime    |
|-----------|-------|--------|-----------|-----------------|
| A_3g_Rep1 | A_3g  | A      | 3g        |     122.4135    |
| A_3g_Rep2 | A_3g  | A      | 3g        |     124.5801    |
| A_3g_Rep3 | A_3g  | A      | 3g        |     124.9419    |
| A_6g_Rep1 | A_6g  | A      | 6g        |     120.5004    |
| A_6g_Rep2 | A_6g  | A      | 6g        |     124.1666    |
| A_6g_Rep3 | A_6g  | A      | 6g        |     124.6453    |
| B_3g_Rep1 | B_3g  | B      | 3g        |     132.568     |
| B_3g_Rep2 | B_3g  | B      | 3g        |     137.5242    |
| B_3g_Rep3 | B_3g  | B      | 3g        |     135.5238    |
| B_6g_Rep1 | B_6g  | B      | 6g        |     137.1333    |
| B_6g_Rep2 | B_6g  | B      | 6g        |     142.733     |
| B_6g_Rep3 | B_6g  | B      | 6g        |     140.0722    |

First, I was using the following which correctly calculates mean and standard deviation values. However, it includes groups on the x-axis aren’t present in the facet.

DT_plotA <- ggplot(df, aes(Group, DoublingTime)) +
  stat_summary(fun.data="mean_sdl", fun.args = list(mult=1),
               geom="errorbar", width=0.5) +
  stat_summary(fun=mean, geom="point", size=3) +
  facet_wrap(nrow = 1, .~Strain)

DT_plotA

I was pleasantly surprised that adjusting my aes() x-value to Condition while including facet_wrap() caused stat_summary() to correctly calculate mean and standard deviation for each Group correctly.

DT_plotB <- ggplot(df, aes(Condition, DoublingTime)) +
  stat_summary(fun.data="mean_sdl", fun.args = list(mult=1),
               geom="errorbar", width=0.5) +
  stat_summary(fun=mean, geom="point", size=3) +
  facet_wrap(nrow = 1, .~Strain)

DT_plotB

However, if facet_wrap is removed from the plot, stat_summary calculates mean and standard deviation based on Condition: data from independent Strains is averaged. I worry that this caveat will be forgotten and lead to incorrect calculation of mean/sd when facet is removed.

DT_plotC <- ggplot(df, aes(Condition, DoublingTime)) +
  stat_summary(fun.data="mean_sdl", fun.args = list(mult=1),
               geom="errorbar", width=0.5) +
  stat_summary(fun=mean, geom="point", size=3)

DT_plotC

Question
Is there a way to generate a plot that looks like DT_plotB but instead includes aes(Group, DoublingTime) as in shown in the code for DT_plotA?

>Solution :

Maybe we could do it with some preprocessing of the data -> calculating the mean and sd:

library(dplyr)
library(ggplot2)

df %>% 
  group_by(Strain, Condition) %>% 
  mutate(mean = mean(DoublingTime),
         sd = sd(DoublingTime)) %>% 
  ggplot(aes(x = Condition, y=mean)) +
  geom_point()+
  geom_errorbar(aes(ymin = mean-sd, ymax = mean+sd), width=.2)+
  facet_wrap(.~Strain)

enter image description here

Leave a Reply