Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Connect medians with lines in dodged boxplots within subgroups?

In the code below, which uses an interaction of two variables to define a group, why ggplot2 correctly plots separate points for the medians, but complains that the group consists of only one observation when I try to plot lines connecting the medians within each group?

Please how can I do it?

dat <- 
    data.frame(method = c("A","A","A","A","A","A","A","A","A","A","B","B","B","B","B","B","B","B","B","B"),
               var = c("X01","X01","X01","X01","X01","X02","X02","X02","X02","X02","X01","X01","X01","X01","X01","X02","X02","X02","X02","X02"), 
               value = c(-1.1,0.3,0.3,-0.6,-0.2,0.7,1.2,0.2,0.3,1.8,0.5,1.1,-0.8,0.8,0.6,1.0,0.5,0.6,1.5,1.3))

ggplot(dat, aes(var, value)) +
    geom_boxplot(aes(fill = method), 
                 position = position_dodge(width = .9), 
                 outlier.color = NA) +
    stat_summary(
        fun = median,
        geom = "point",
        aes(group = interaction(method, var)),
        position = position_dodge(width = .9)
    ) +
    stat_summary(
        fun = median,
        geom = "line",
        aes(group = interaction(method, var)),
        position = position_dodge(width = .9)
    )
geom_path: Each group consists of only one observation. Do you need to adjust
the group aesthetic?

enter image description here

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

Desired output:

enter image description here

>Solution :

Because ggplot2 thinks the groups are a combination of method and var. It doesn’t know that after you’ve calculated the median, you’d like to drop the method part of the grouping. I don’t think there is an elegant way to get the lines as you want.

Below is an inelegant put pragmatic solution: you first calculate the medians, then feed that layer data to a new line geom.

library(ggplot2)

dat <- 
  data.frame(method = c("A","A","A","A","A","A","A","A","A","A","B","B","B","B","B","B","B","B","B","B"),
             var = c("X01","X01","X01","X01","X01","X02","X02","X02","X02","X02","X01","X01","X01","X01","X01","X02","X02","X02","X02","X02"), 
             value = c(-1.1,0.3,0.3,-0.6,-0.2,0.7,1.2,0.2,0.3,1.8,0.5,1.1,-0.8,0.8,0.6,1.0,0.5,0.6,1.5,1.3))

p <- ggplot(dat, aes(var, value)) +
  geom_boxplot(aes(fill = method), 
               position = position_dodge(width = .9), 
               outlier.color = NA) +
  stat_summary(
    fun = median,
    geom = "point",
    aes(group = interaction(method, var)),
    position = position_dodge(width = .9)
  )

p + geom_line(
  data = layer_data(p, 2),
  aes(x = x, group = round(x), y = y)
)

Created on 2022-08-14 by the reprex package (v2.0.1)

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading