Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Changing boxplot width (measuring multiple categorical variables) for categorical conditions with missing data

As a preliminary disclaimer, I am still very new to R (this is the first analysis I’ve performed independently), and am hoping this is a reproducible example.

I have a dataset measuring the d.13.C and d.18.O values of various enamel samples through time and space. I want to represent trends within Families across space and time. I have a boxplot I generated in ggplot2 that does this, but I’m running into a few problems:

d %>%
  mutate(across(Member, factor, levels = c("UpperBurgi", "KBS", "Okote"))) %>%
  mutate(across(Dep_context, factor, levels = c("Lacustrine", "Deltaic", "Fluvial "))) %>% 
  ggplot(aes(x = Member, y = d.13.C)) + 
  geom_boxplot(aes(x = Member, y = d.13.C, col = Dep_context, fill = Dep_context), alpha = 0.5, lwd = 1) +
  facet_wrap(~Family) +
  scale_fill_brewer(palette = "Dark2") + 
  scale_color_brewer(palette = "Dark2") + 
  theme_bw() 

It produces something like this:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

plot produced by above code

Since my data is not evenly distributed (not every depositional context is represented in each geologic member in each family), the boxplots for each depositional environment are different. I would like them to all be the same width, regardless of if the data is present or not (e.g., equivalent to the size of the ones in Bovidae in the KBS Member).

I’ve tried messing around with width = in the geom_boxplot call, I’ve tried using theme() to change aspects of the grid, and I’ve tried the drop = FALSE call, but that didn’t change anything. I’ve also tried faceting my member and depositional environment, but that did not look as appealing and seemed clunkier. Is there a way to accomplish this, or is faceting the way to go?

I provided my dataframe below. *note: it’s a subset since otherwise, the output was too long.

dput(head(d))
structure(list(CA = c("6", "1", "104", "105", "6A", "6A"), Member = c("KBS", 
"Okote", "KBS", "KBS", "KBS", "KBS"), Dep_context = c("Deltaic", 
"Fluvial ", "Fluvial ", "Fluvial ", "Deltaic", "Deltaic"), Family = c("Equidae", 
"Equidae", "Equidae", "Equidae", "Equidae", "Equidae"), Tribe = c("", 
"", "", "", "", ""), Genus = c("Equus", "Equus", "Equus", "Equus", 
"Equus", "Equus"), d.13.C = c(-0.3, -0.7, 0.7, -0.9, -0.1, -0.8
), d.18.O = c(0, 1.6, 4, 2.6, 1.8, 0.2), Age.range = c("1.87-1.56", 
"1.56-1.38", "1.87-1.56", "1.87-1.56", "1.87-1.56", "1.87-1.56"
)), row.names = c(NA, -6L), class = c("tbl_df", "tbl", "data.frame"
))

>Solution :

You could use position_dodge2 with preserve = "single" to keep the boxplot width the same across different groups like this:

library(ggplot2)
library(dplyr)

d %>%
  mutate(across(Member, factor, levels = c("UpperBurgi", "KBS", "Okote"))) %>%
  mutate(across(Dep_context, factor, levels = c("Lacustrine", "Deltaic", "Fluvial "))) %>% 
  ggplot(aes(x = Member, y = d.13.C)) + 
  geom_boxplot(aes(x = Member, y = d.13.C, col = Dep_context, fill = Dep_context), alpha = 0.5, lwd = 1,
               position = position_dodge2(preserve = "single")) +
  facet_wrap(~Family) +
  scale_fill_brewer(palette = "Dark2") + 
  scale_color_brewer(palette = "Dark2") + 
  theme_bw() 

Created on 2023-02-08 with reprex v2.0.2

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading