Let me randomly generate some data with available packages to demonstrate my issue. I am using the
randomForestSRC package to run some survival random forests, and I am plotting the results of the random forest as a ggplot using the
ggRandomForests package. You’ll see the plot I get at the very end.
I want my boxplots in the order "Yes", then "No", then "Maybe".
library(ggplot2) library(ggRandomForests) library(randomForestSRC) library(survival) df <- cancer # should grab the cancer data set from survival library # Randomly generate some categorical data var <- sample(c('Yes', 'No', 'Maybe'), 228, replace=TRUE) df$var <- as.factor(var) # Attempt to put them in the order I want (first yes, then no, then maybe) df$var <- factor(df$var, levels = c("Yes", "No", "Maybe")) levels(df$var) # Verify it is in order of "Yes", "No", "Maybe" # Run survival random forests rf <- rfsrc(Surv(time, status) ~ var, data = df, ntree = 1000, samptype = "swr", seed = 12345, membership = TRUE) # Create a plot of the outcome, writing the plot object to a variable pl <- plot.variable(rf, xvar.names = "var", partial = TRUE, surv.type = "years.lost", time = 365, show.plots = FALSE) # Create a ggplot with the plot object with the ggRandomForests package # Also tack on some labels to demonstrate how this code works plot(gg_partial(pl)) + xlab("Category") + ylab("Outcome")
If you got what I got, then you should be seeing the plots in alphabetical order: Maybe, No, Yes. Which is, of course, NOT the order I wanted.
The only way I know to rearrange the order in a ggplot is to use that levels argument; I don’t know of any other method for fixing this. Any ideas?
You could set the order via the
limits argument of
library(ggplot2) library(ggRandomForests) library(randomForestSRC) library(survival) plot(gg_partial(pl)) + labs(x = "Category", y = "Outcome") + scale_x_discrete(limits = levels(df$var))