How do I create a stacked bar chart with summary data?

I have a dataframe in R that looks like this:

Treatment <- c(
  "Surgery", "Surgery", "Surgery", "Radiation", "Radiation", "Radiation", 
  "Active Surveillance", "Active Surveillance", "Active Surveillance", 
  "Surgery", "Surgery", "Surgery", "Radiation", "Radiation", "Radiation", 
  "Active Surveillance", "Active Surveillance", "Active Surveillance"
)

Outcome <- c(
  "Sexual", "Urinary", "Bowel", "Sexual", "Urinary", "Bowel", "Sexual", 
  "Urinary", "Bowel", "Sexual", "Urinary", "Bowel", "Sexual", "Urinary", 
  "Bowel", "Sexual", "Urinary", "Bowel"
)

Proportion <- c(
  47, 8.7, 2.6, 37, 10, 8.9, 22, 7.9, 2.8, 39, 11, 3.7, 35, 11, 8.7, 27, 9.1, 4.5
)

Age <- c(
  "<70", "<70", "<70", "<70", "<70", "<70", "<70", "<70", "<70", ">70", 
  ">70", ">70", ">70", ">70", ">70", ">70", ">70", ">70"
)

df <- data.frame(Treatment, Outcome, Proportion, Age)

I want to create two plots. One should show for men under the age of 70 the treatment choice and a horizontal stacked bar showing the proportion of people who experienced the sexual outcome and those who didn’t (i.e. a bar chart with a 47% value and another with a 53% value). Then there should be bars in the same fashion under the first one for each other treatment-outcome relationship, e.g. Surgery + Urinary, Surgery + Bowel, Radiation + Urinary and so on.

And then the second plot should show the same for men over the age of 70.

I’m having difficulty because most examples of this I’ve seen have all the values rather than just summary proportions in the dataframe so the ggplot can count them up and plot it.

Not sure how to approach when I only have summary proportions? Or perhaps it’s an issue with how my data is organised?

>Solution :

I think you can achieve something similar to your description by:

  • subtracting Proportion from 100 and putting that result in a new column
  • reshaping the data from wide to long
  • plotting the values versus Treatment
  • using facets to plot by Age and Outcome

Here I renamed the proportions as Negative or Positive, but you may want to choose something more appropriate.

library(dplyr)
library(tidyr)
library(ggplot2)

df %>% 
  mutate(Negative = 100 - Proportion) %>% 
  rename(Positive = Proportion) %>% 
  pivot_longer(matches("^(Pos|Neg)")) %>% 
  ggplot(aes(Treatment, value)) + 
  geom_col(aes(fill = name)) + 
  facet_grid(Outcome ~ Age)

Result:

enter image description here

Leave a Reply