Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Where does ggplot set the order of the color scheme?

I have a data set that I’m showing in a series of violin plots with one categorical variable and one continuous numeric variable. When R generated the original series of violins, the categorical variable was plotted alphabetically (I rotated the plot, so it appears alphabetically from bottom to top). I thought it would look better if I sorted them using the numeric variable.

When I do this, the color scheme doesn’t turn out as I wanted it to. It’s like R assigned the colors to the violins before it sorted them; after the sorting, they kept their original colors – which is the opposite of what I wanted. I wanted R to sort them first and then apply the color scheme.

I’m using the viridis color scheme here, but I’ve run into the same thing when I used RColorBrewer.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

Here is my code:

# Start plotting
g <- ggplot(NULL)

# Violin plot
g <- g + geom_violin(data = df, aes(x = reorder(catval, -numval, 
na.rm = TRUE), y = numval, fill = catval), trim = TRUE, 
scale = "width", adjust = 0.5)

(snip)

# Specify colors
g <- g + scale_colour_viridis_d()

# Remove legend
g <- g + theme(legend.position = "none") 

# Flip for readability
g <- g + coord_flip()

# Produce plot
g

Here is the resulting plot. violinplot

If I leave out the reorder() argument when I call geom_violin(), the color order is what I would like, but then my categorical variable is sorted alphabetically and not by the numeric variable.

Is there a way to get what I’m after?

>Solution :

I think this is a reproducible example of what you’re seeing. In the diamonds dataset, the mean price of "Good" diamonds is actually higher than the mean for "Very Good" diamonds.

library(dplyr)
diamonds %>%
  group_by(cut) %>%
  summarize(mean_price = mean(price))
# A tibble: 5 x 2
  cut       mean_price
  <ord>          <dbl>
1 Fair           4359.
2 Good           3929.
3 Very Good      3982.
4 Premium        4584.
5 Ideal          3458.

By default, reorder uses the mean of the sorting variable, so Good is plotted above Very Good. But the fill is still based on the un-reordered variable cut, which is a factor in order of quality.

ggplot(diamonds, aes(x = reorder(cut, -price),
                     y = price, fill = cut)) + 
  geom_violin() +
  coord_flip()

enter image description here

If you want the color to follow the ordering, then you could reorder upstream of ggplot2, or reorder in both aesthetics:

ggplot(diamonds, aes(x = reorder(cut, -price),
                     y = price, 
                     fill = reorder(cut, -price))) + 
  geom_violin() +
  coord_flip()

Or

diamonds %>%
  mutate(cut = reorder(cut, -price)) %>%
  ggplot(aes(x = cut, y = price, fill = cut)) + 
  geom_violin() +
  coord_flip()

enter image description here

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading