Changing boxplot width (measuring multiple categorical variables) for categorical conditions with missing data

As a preliminary disclaimer, I am still very new to R (this is the first analysis I’ve performed independently), and am hoping this is a reproducible example.

I have a dataset measuring the d.13.C and d.18.O values of various enamel samples through time and space. I want to represent trends within Families across space and time. I have a boxplot I generated in ggplot2 that does this, but I’m running into a few problems:

d %>%
  mutate(across(Member, factor, levels = c("UpperBurgi", "KBS", "Okote"))) %>%
  mutate(across(Dep_context, factor, levels = c("Lacustrine", "Deltaic", "Fluvial "))) %>% 
  ggplot(aes(x = Member, y = d.13.C)) + 
  geom_boxplot(aes(x = Member, y = d.13.C, col = Dep_context, fill = Dep_context), alpha = 0.5, lwd = 1) +
  facet_wrap(~Family) +
  scale_fill_brewer(palette = "Dark2") + 
  scale_color_brewer(palette = "Dark2") + 
  theme_bw() 

It produces something like this:

plot produced by above code

Since my data is not evenly distributed (not every depositional context is represented in each geologic member in each family), the boxplots for each depositional environment are different. I would like them to all be the same width, regardless of if the data is present or not (e.g., equivalent to the size of the ones in Bovidae in the KBS Member).

I’ve tried messing around with width = in the geom_boxplot call, I’ve tried using theme() to change aspects of the grid, and I’ve tried the drop = FALSE call, but that didn’t change anything. I’ve also tried faceting my member and depositional environment, but that did not look as appealing and seemed clunkier. Is there a way to accomplish this, or is faceting the way to go?

I provided my dataframe below. *note: it’s a subset since otherwise, the output was too long.

dput(head(d))
structure(list(CA = c("6", "1", "104", "105", "6A", "6A"), Member = c("KBS", 
"Okote", "KBS", "KBS", "KBS", "KBS"), Dep_context = c("Deltaic", 
"Fluvial ", "Fluvial ", "Fluvial ", "Deltaic", "Deltaic"), Family = c("Equidae", 
"Equidae", "Equidae", "Equidae", "Equidae", "Equidae"), Tribe = c("", 
"", "", "", "", ""), Genus = c("Equus", "Equus", "Equus", "Equus", 
"Equus", "Equus"), d.13.C = c(-0.3, -0.7, 0.7, -0.9, -0.1, -0.8
), d.18.O = c(0, 1.6, 4, 2.6, 1.8, 0.2), Age.range = c("1.87-1.56", 
"1.56-1.38", "1.87-1.56", "1.87-1.56", "1.87-1.56", "1.87-1.56"
)), row.names = c(NA, -6L), class = c("tbl_df", "tbl", "data.frame"
))

>Solution :

You could use position_dodge2 with preserve = "single" to keep the boxplot width the same across different groups like this:

library(ggplot2)
library(dplyr)

d %>%
  mutate(across(Member, factor, levels = c("UpperBurgi", "KBS", "Okote"))) %>%
  mutate(across(Dep_context, factor, levels = c("Lacustrine", "Deltaic", "Fluvial "))) %>% 
  ggplot(aes(x = Member, y = d.13.C)) + 
  geom_boxplot(aes(x = Member, y = d.13.C, col = Dep_context, fill = Dep_context), alpha = 0.5, lwd = 1,
               position = position_dodge2(preserve = "single")) +
  facet_wrap(~Family) +
  scale_fill_brewer(palette = "Dark2") + 
  scale_color_brewer(palette = "Dark2") + 
  theme_bw() 

Created on 2023-02-08 with reprex v2.0.2

Leave a Reply