I was trying to create a regression plot that shows the regression line for two subgroups and also the entire dataframe.
While doing that i stumbled across the question if it was possible to add a group to the that doesn’t exist in the dataframe to the legend (my variable only has two distinct groups, but I want to write three things in the legend).
For me specifically to add a legend for the regression with both groups combined. But I was also wondering in general.
Below you find some sample code.
Every help is much appreciated!
#Load packages
library(MASS)
library(ggplot2)
library(dplyr)
#Set a seed
set.seed(1234)
#Create random dataframe
sigma1 <- rbind(c(1, 0.8), c(0.8, 1))
mu <- c(4.5, 3.2)
dta1 <- as.data.frame(
mvrnorm(n = 1000, mu = mu, Sigma = sigma1)) |>
mutate(
group = as.factor(sample(c(1), 1000, replace = TRUE))
)
sigma2 <- rbind(c(1, -0.5), c(-0.5, 1))
dta2 <- as.data.frame(
mvrnorm(n = 1000, mu = mu, Sigma = sigma2)) |>
mutate(
group = as.factor(sample(c(2), 1000, replace = TRUE))
)
dta <- rbind(dta1, dta2)
#Create the graphic
ggplot(dta, aes(x = V1, y = V2)) +
geom_point(aes(color = group)) +
geom_smooth(method = "lm", se = FALSE) +
geom_smooth(method = "lm", se = FALSE, aes(color = group)) +
scale_color_manual(name = "Legend", values = c("green", "orange"), labels = c("A", "B"))
>Solution :
Try this:
dta$group <- factor(dta$group,levels = c('1','2','3'))
ggplot(dta, aes(x = V1, y = V2)) +
geom_point(aes(color = group)) +
geom_smooth(method = "lm", se = FALSE) +
geom_smooth(method = "lm", se = FALSE, aes(color = group)) +
scale_color_manual(name = "Legend",
values = c("green", "orange","blue"),
labels = c("A", "B","Overall"),
drop = FALSE)
The strategy is to create a "dummy" unused factor level, and then manually label it the way you want. Note the need to include drop = FALSE in the scale, otherwise the unused factor level will be omitted.