I am trying to graph the proportion of people in Remission (which is binary 0/1) after treatment by year. I can find how to graph the count, but I would like the proportion as there are a different number of people each year.
My data look something like this:
| Client_id | Year | Remission |
|---|---|---|
| 2 | 2016 | 0 |
| 4 | 2017 | 1 |
| 7 | 2017 | 0 |
| 8 | 2016 | 1 |
| 12 | 2016 | 1 |
I would like to create a plot with Year on the x-axis and the proportion of those in remission on the y-axis. Ideally, I would be able to do this both using geom_bar and geom_line.
I have tried this code, but it gives a proportion of 1.00 for every year, which is not correct.
ggplot(data=df)+
geom_bar(aes(x=Year,y=Remission),stat="identity",position="dodge")
I could calculate this manually for each year and create a table using Excel, but hoping for a way to complete it in ggplot2.
>Solution :
You could use position = "fill" in your geom_bar and use fill = Remission in your ggplot aesthetics like this:
library(dplyr)
library(ggplot2)
df %>%
mutate(Year = as.character(Year),
Remission = as.factor(Remission)) %>%
ggplot(aes(x=Year, fill = Remission)) +
geom_bar(position = "fill") +
labs(y = "Proportion")

Created on 2022-08-22 with reprex v2.0.2
Percentage scale
If you want a percentage scale, you can use the package scales with function percent_format() in scale_y_continuous like this:
library(dplyr)
library(ggplot2)
library(scales)
df %>%
mutate(Year = as.character(Year),
Remission = as.factor(Remission)) %>%
ggplot(aes(x=Year, fill = Remission)) +
geom_bar(position = "fill") +
scale_y_continuous(labels=percent_format()) +
labs(y = "Proportion")

Created on 2022-08-22 with reprex v2.0.2
Proportion with geom_line
You can do this by first calculating the proportion using count and group_by with a mutate and plot the data like this:
library(dplyr)
library(ggplot2)
df %>%
mutate(Year = as.numeric(Year),
Remission = as.factor(Remission)) %>%
count(Year, Remission) %>%
group_by(Year) %>%
mutate(prop = n/sum(n)) %>%
ungroup() %>%
ggplot(aes(x=Year, y = prop, color = Remission)) +
geom_line() +
scale_x_continuous(breaks = c(2016,2017)) +
labs(y = "Proportion")

Created on 2022-08-22 with reprex v2.0.2