I have a dataframe with a column for test prep course completion and a column for low-income. Both of these are categorical.
I want to graph the count of student from low-income families who completed the course vs. those that did not. Currently my process seems to be too cumbersome.
My process is below
Original Data
|low_income|test|
|—|—|
|yes|completed|
|yes|none|
|no|completed|
|yes|none|
etc…
STEP 1: Create a frequency table
| completed | none | |
|---|---|---|
| no | 3 | 1 |
| yes | 5 | 3 |
STEP 2: Manually Create new dataframe * This is the part that I am concerned about
| low_income | test | count |
|---|---|---|
| no | completed | 3 |
| no | none | 1 |
| yes | completed | 5 |
| yes | none | 3 |
then finally graph that
here is my full code:
suppressPackageStartupMessages(library(ggplot2))
# Sample data for dataframe
low_income <- c("yes","yes", "no","yes","yes","yes", "no","yes","yes","yes", "no","no")
test <- c("completed", "none","completed", "none","completed", "completed","completed", "completed", "none","completed", "none","completed")
df <- data.frame(low_income, test)
# STEP 1: Create afrequency table to get the counts
table1 <- table(df$low_income, df$test)
# STEP 2: Use cross tabs to manually create a new dataframe <-- I feel like I'm going wrong here
low_income <- c("no","no", "yes","yes")
test <- c("completed", "none","completed", "none")
count <- c(3, 1, 5,3)
df_2 <- data.frame(low_income, test,count)
# STEP 3: Finally graphing
ggplot(df_2, aes(factor(low_income), count, fill = test)) +
geom_bar(stat="identity", position = "dodge") +
scale_fill_brewer(palette = "Set1")
>Solution :
Here is the suggestion by @Jahi Zamy a little modified:
library(tidyverse)
df %>%
dplyr::count(low_income, test) %>%
ggplot(aes(x = low_income, y = n, fill=test)) +
geom_col(position = position_dodge()) +
scale_fill_brewer(palette = "Set1")
