Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Displaying percentages within category for continuous/ordered variable (with ggplot)

I have two questions, the first a (hopefully) straightforward mechanical one and the second more theoretical (though still with a technical element).

  1. I am trying to do something nearly identical to this question, but I have a variable that is ordered/continuous (0 – 4), instead of a 1/0 dichotomous variable, which means that filtering == 1 will not work. To summarize here, I simply want to display the percent of each level within each race category.

  2. I am also hoping to figure out a way to display those descriptive results for all 3 questions in just one figure. I first thought about trying to do some type of facet_wrap() with each variable (question1, question2, question3) being its own panel. Would that require pivot_longer() to make my data long instead of wide? Another thought was to have just one figure/panel, but each x axis tick is a race category instead of a question, and then it’d have 3 bars for each of the 3 questions. I’m not sure how I’d get that to work, though.

    MEDevel.com: Open-source for Healthcare and Education

    Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

    Visit Medevel

Thanks in advance and sorry for this wordy question. Here is some example data:

set.seed(123)

d <- data.frame(
  race = sample(c("White", "Hispanic", "Black", "Other"), 100, replace = TRUE),
  question1 = sample(0:4, 100, replace = TRUE),
  question2 = sample(0:4, 100, replace = TRUE),
  question3 = sample(0:4, 100, replace = TRUE)
)

>Solution :

How about this:

  library(tidyverse)
set.seed(123)
d <- data.frame(
  race = sample(c("White", "Hispanic", "Black", "Other"), 100, replace = TRUE),
  question1 = sample(0:4, 100, replace = TRUE),
  question2 = sample(0:4, 100, replace = TRUE),
  question3 = sample(0:4, 100, replace = TRUE)
)

d %>%
  pivot_longer(-race, names_to = "question", values_to = "vals") %>% 
  group_by(question, race, vals) %>% 
  tally() %>% 
  group_by(question, race) %>% 
  mutate(pct = n/sum(n)) %>% 
  ggplot(aes(x=race, y=pct, fill=as.factor(vals))) + 
  geom_bar(position="stack", stat="identity") + 
  facet_wrap(~question) + 
  scale_y_continuous(labels = scales:::percent) + 
  labs(x="", y="Percentage (within Race)", fill="Response") + 
  theme_bw() + 
  theme(legend.position = "top", 
        panel.grid = element_blank(), 
        axis.text.x = element_text(angle = 45, hjust=1))

Created on 2022-05-23 by the reprex package (v2.0.1)

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading