I am working on percentage changes between periods and struggling with logaritmic transformation of labels. Here is an example based on the storms dataset:
library(dplyr)
library(ggplot2)
library(scales)
df <- storms |>
group_by(year) |>
summarise(wind = mean(wind)) |>
mutate(lag = lag(wind, n = 1)) |>
mutate(perc = (wind / lag) - 1) |>
tidyr::drop_na()
I want to visualize the distribution of percentages, making the percentage change symmetrical (log difference) with log1p.
ggplot(df, aes(x = log1p(perc))) +
geom_histogram(bins = 5)
At this point I wanted to transform the x-axis label back to the original percentage value.
I tried to create my own transformation with trans_new, and applied it to the labels in scale_x_continuous, but I can’t make it work.
trans_perc <- trans_new(
name = "trans_perc",
transform = log1p_trans(),
inverse = function(x)
expm1(x),
breaks = breaks_log(),
format = percent_format(),
domain = c(-Inf, Inf)
)
ggplot(df, aes(x = log1p(perc))) +
geom_histogram(bins = 5) +
scale_x_continuous(labels = trans_perc)
Currently, the result is:
Error in
get_labels():
!breaksandlabelsare different lengths
Runrlang::last_error()to see where the error occurred.
Thanks!
>Solution :
If I understand you correctly, you want to keep the shape of the histogram, but change the labels so that they reflect the value of the perc column rather the transformed log1p(perc) value. If that is the case, there is no need for a transformer object. You can simply put the reverse transformation (plus formatting) as a function into the labels argument of scale_x_continuous:
ggplot(df, aes(x = log1p(perc))) +
geom_histogram(bins = 5) +
scale_x_continuous("Percentage Change", labels = ~ percent(expm1(.x))
Note that although the histogram remains symmetrical in shape, the axis labels represent the back-transformed values of the original axis labels.
If you want the breaks to be at round numbers on the back-transformed scale, then you can do:
ggplot(df, aes(x = log1p(perc))) +
geom_histogram(bins = 5) +
scale_x_continuous("Percentage Change",
breaks = log1p(pretty(df$perc, 5)),
labels = ~ percent(expm1(.x)))
I think this second version is preferable, because it shows the log nature of the x scale more clearly (including the logarithmically spaced breaks / grid)

