Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

How do I create a stacked bar chart with summary data?

I have a dataframe in R that looks like this:

Treatment <- c(
  "Surgery", "Surgery", "Surgery", "Radiation", "Radiation", "Radiation", 
  "Active Surveillance", "Active Surveillance", "Active Surveillance", 
  "Surgery", "Surgery", "Surgery", "Radiation", "Radiation", "Radiation", 
  "Active Surveillance", "Active Surveillance", "Active Surveillance"
)

Outcome <- c(
  "Sexual", "Urinary", "Bowel", "Sexual", "Urinary", "Bowel", "Sexual", 
  "Urinary", "Bowel", "Sexual", "Urinary", "Bowel", "Sexual", "Urinary", 
  "Bowel", "Sexual", "Urinary", "Bowel"
)

Proportion <- c(
  47, 8.7, 2.6, 37, 10, 8.9, 22, 7.9, 2.8, 39, 11, 3.7, 35, 11, 8.7, 27, 9.1, 4.5
)

Age <- c(
  "<70", "<70", "<70", "<70", "<70", "<70", "<70", "<70", "<70", ">70", 
  ">70", ">70", ">70", ">70", ">70", ">70", ">70", ">70"
)

df <- data.frame(Treatment, Outcome, Proportion, Age)

I want to create two plots. One should show for men under the age of 70 the treatment choice and a horizontal stacked bar showing the proportion of people who experienced the sexual outcome and those who didn’t (i.e. a bar chart with a 47% value and another with a 53% value). Then there should be bars in the same fashion under the first one for each other treatment-outcome relationship, e.g. Surgery + Urinary, Surgery + Bowel, Radiation + Urinary and so on.

And then the second plot should show the same for men over the age of 70.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

I’m having difficulty because most examples of this I’ve seen have all the values rather than just summary proportions in the dataframe so the ggplot can count them up and plot it.

Not sure how to approach when I only have summary proportions? Or perhaps it’s an issue with how my data is organised?

>Solution :

I think you can achieve something similar to your description by:

  • subtracting Proportion from 100 and putting that result in a new column
  • reshaping the data from wide to long
  • plotting the values versus Treatment
  • using facets to plot by Age and Outcome

Here I renamed the proportions as Negative or Positive, but you may want to choose something more appropriate.

library(dplyr)
library(tidyr)
library(ggplot2)

df %>% 
  mutate(Negative = 100 - Proportion) %>% 
  rename(Positive = Proportion) %>% 
  pivot_longer(matches("^(Pos|Neg)")) %>% 
  ggplot(aes(Treatment, value)) + 
  geom_col(aes(fill = name)) + 
  facet_grid(Outcome ~ Age)

Result:

enter image description here

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading