Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Add a regression line to ggscatter plot but ignore grouping

I am using ggscatter on R to plot a pearson correlation between two variables. However, when I color points, it appears that one reg.line is computed for each different colors. What I want to do is to color y points in the plot according to the column named ‘mycolor’ but I want the regression line to be computed on the whole data, regardless of the color.

Here is the function I use, with color or without color :

df < - structure(list(my_x = c(131L, 100L, NA, 125L, 50L, 50L, 16L, 
3L, 27L, 96L, 176L, 121L, 129L, 84L, 67L, 35L, 36L, 18L, 29L, 
29L, 26L, 25L, 24L, 20L, 28L, 22L, 25L, 15L, 0L, 18L, 13L, 17L, 
14L, 23L, 27L, NA, 6L, 1L, 7L, 1L, 20L, 30L, 16L, 22L, 23L, 22L, 
17L, 12L, 14L, 28L, 16L, 20L, 44L, 27L, 16L, 6L, 10L, 9L, 16L, 
2L, 43L, 6L, 2L, 0L, 1L, 1L, 1L, 1L, 2L, 1L, 47L, 22L, 7L, 3L, 
4L, 3L, 1L, 1L, 1L, 4L, 4L, 1L, 25L, 3L, 3L, 3L, 6L, 6L, 4L, 
1L, 2L, 2L, 5L, 8L, 3L, 5L, 1L, 1L, 1L, 2L, 3L, 6L, 6L, 4L, 8L, 
1L, 4L, 1L, 5L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 2L, 1L, 0L, 0L, 
2L, 0L, 1L, 2L, 3L, 3L, 4L, 4L, 3L, 2L, 3L, 1L, 2L, 1L), my_y = c(134L, 
90L, 130L, 134L, 44L, 48L, 17L, 4L, 19L, 97L, 178L, 39L, 132L, 
90L, 35L, 35L, 36L, 18L, 28L, 14L, 25L, 26L, 24L, 18L, 25L, 22L, 
9L, 15L, 0L, 21L, 6L, 15L, 15L, 21L, 27L, 19L, 7L, 0L, 8L, 2L, 
10L, 30L, 19L, 23L, 12L, 23L, 16L, 6L, 14L, 29L, 15L, 12L, 21L, 
14L, 11L, 7L, 5L, 4L, 16L, 5L, 36L, 5L, 2L, 0L, 1L, 1L, 1L, 1L, 
2L, 1L, 50L, 22L, 7L, 3L, 6L, 3L, 1L, 1L, 1L, 4L, 4L, 1L, 21L, 
3L, 3L, 3L, 6L, 7L, 4L, 1L, 2L, 2L, 1L, 6L, 3L, 2L, 1L, 1L, 2L, 
2L, 3L, 2L, 6L, 7L, 6L, 1L, 4L, 1L, 5L, 2L, 1L, 2L, 2L, 2L, 2L, 
1L, 2L, 2L, 1L, 0L, 0L, 2L, 0L, 1L, 2L, 3L, 2L, 4L, 4L, 3L, 2L, 
3L, 1L, 2L, 1L), mycolor = c("color1", "color1", "color1", 
"color1", "color1", "color1", "color1", "color1", "color1", 
"color1", "color1", "color1", "color1", "color1", "color1", 
"color2", "color2", "color2", "color2", "color2", "color2", 
"color2", "color2", "color2", "color2", "color2", "color2", 
"color2", "color2", "color2", "color2", "color2", "color2", 
"color2", "color2", "color2", "color2", "color7", 
"Turtle", "Turtle", "color2", "color2", "color2", "color2", 
"color2", "color2", "color2", "color2", "color2", "color2", 
"color2", "color2", "color2", "color2", "color2", "color2", 
"color2", "color2", "color2", "color3", "color4", 
"color4", "color4", "color4", "color4", 
"color4", "color4", "color4", "color4", 
"color4", "color4", "color4", "color5", 
"color5", "color5", "color5", "color5", 
"color5", "color5", "color5", "color5", 
"color5", "color5", "color5", "color5", 
"color5", "color5", "color6", "color6", "color6", "color6", 
"color6", "color6", "color6", "color6", "color6", "color6", "color6", "color6", 
"color6", "color6", "color6", "color6", "color6", "color6", "color6", "color6", 
"color6", "color6", "color6", "color6", "color6", "color6", "color6", "color6", 
"color6", "color6", "color6", "color6", "color6", "color6", "color6", "color6", 
"color6", "color6", "color6", "color6", "color6", "color6", "color6", "color6", 
"color6", "color6", "color6", "color6")), class = "data.frame", row.names = c(NA, 
-135L))
df %>%
  ggscatter(., y="my_y", x="my_x",
            color="mycolor",
            add = "reg.line", conf.int = TRUE, 
            cor.coef = TRUE, cor.method = "pearson")


df %>%
  ggscatter(., y="my_y", x="my_x",
            add = "reg.line", conf.int = TRUE, 
            cor.coef = TRUE, cor.method = "pearson")

The two results :

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

enter image description here

Taking the example above, I basically want to have the plot on the left but replacing the regression lines with the regression line of the right plot

Is there anyway to do this with ggscatter or should I use ggplot2 geom_point and add the regression line myself ?

Thanks for any help !

Maxime

>Solution :

IMHO the easiest appraoch would be to add the regression line manually using geom_smooth.

Using mtcars as example data:

library(ggpubr)
#> Loading required package: ggplot2

mtcars %>%
  mutate(cyl = factor(cyl)) %>%
  ggscatter(., y="hp", x="mpg",
            color="cyl",
            cor.coef = TRUE, cor.method = "pearson") +
  geom_smooth(method = "lm", color = "black")
#> `geom_smooth()` using formula 'y ~ x'

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading