I am having issues with ggplot2 geom_histogram when plotting frequencies and using facet_wrap at the same time:
#
myTestDF<- data.frame(
Sample = as.vector(replicate(n = 6, expr = c('s1', 's2', 's3'))),
var2 = c(
replicate(n = 9, expr = 't1'),
replicate(n = 9, expr = 't2')),
Val1 = c(
replicate(n = 3, expr = c(2,20,40)),
replicate(n = 3, expr = c(0.2,0.4,0.6))),
stringsAsFactors = FALSE)
myTestDF<- rbind(myTestDF, data.frame(Sample = 's2', var2 = 't1', Val1 = 70,
stringsAsFactors = FALSE)) ##afterthought :)
myTestDF$var3<- paste(myTestDF$Sample, myTestDF$var2, sep = '_')
###Now, this works:
ggplot(
data = myTestDF[myTestDF$Sample=='s2',],
aes(x = Val1, fill = var2)) +
geom_histogram(
aes(y = after_stat(c(
count[group==1]/sum(count[group==1]),
count[group==2]/sum(count[group==2])))),
position = "identity", alpha = 0.5) +
labs(title = 'testHist',
x = "Val1", y = "Frequency") +
theme_minimal()
#
However, I can’t figure a way to make it work with facet_wrap by ‘Sample’. The frequencies get all messed up, and after experimenting and reading around, I can’t find a way to do it. Of course I can do a for loop , but I would like to understand if it can be done with facet_wrap or other ggplot2 function. Looking forward to your feedback.
>Solution :
The issue is that you do not account for the panels when computing the relative frequencies separately for each group, i.e. when you use facet_wrap the data is ordered first by PANEL and second by group. Instead I would suggest to use e.g. ave() to compute the relative frequencies. In the code below I also added the PANEL as a second grouping variable.
library(ggplot2)
ggplot(
myTestDF, aes(x = Val1, fill = var2)
) +
geom_histogram(
aes(y = after_stat(
ave(count, group, PANEL, FUN = \(x) x / sum(x))
)),
position = "identity", alpha = 0.5
) +
labs(
title = "testHist",
x = "Val1", y = "Frequency"
) +
theme_minimal() +
facet_wrap(~Sample)
#> `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
