I’m practicing with this database:
https://studysites.uk.sagepub.com/dsur/study/DSUR%20Data%20Files/Chapter%205/DownloadFestival(No%20Outlier).dat
I’m trying to plot the density histogram with overlapped a theoretical normal distribution with the same mean and standard deviation
Since data are skewed, I tried a log(1+x) transformation:
knitr::opts_chunk$set(echo = TRUE)
library("tidyverse", "here", "magrittr")
dlf<-here::here("data/DownloadFestival(No Outlier).dat") %>% readr::read_delim(col_names = TRUE)
day2log<-na.omit(log1p(dlf$day2))
day2log<-as.numeric(day2log)
glimpse(day2log)
hist.day2log<-ggplot2::ggplot(dlf, aes(day2log)) +
geom_histogram(aes(y=..density..), colour = "#76b81b", fill = "blue" ) +
stat_function(fun = dnorm, args = list(mean = mean(day2log), sd = sd(day2log)), colour = "black", size = 2) +
labs( x = "Hygiene score day 2log", y = "Density day2log") +
theme_minimal()
hist.day2log
The error I get is the following:
num [1:264] 0.854 0.88 0.077 0.365 0.182 ...
Error in `check_aesthetics()`:
! Aesthetics must be either length 1 or the same as the data (810): x
Backtrace:
1. base `<fn>`(x)
2. ggplot2:::print.ggplot(x)
4. ggplot2:::ggplot_build.ggplot(x)
5. ggplot2 by_layer(function(l, d) l$compute_aesthetics(d, plot))
6. ggplot2 f(l = layers[[i]], d = data[[i]])
7. l$compute_aesthetics(d, plot)
8. ggplot2 f(..., self = self)
9. ggplot2:::check_aesthetics(evaled, n)
Error in check_aesthetics(evaled, n) :
I tried to check the day2log variable and It looks ok:
glimpse(day2log)
num [1:264] 0.854 0.88 0.077 0.365 0.182 ...
Thanks in advance for the help.
>Solution :
You have removed the NA values from day2, so the vector day2log is now shorter than dlf, which is the data frame you are passing into ggplot. Since ggplot expects any external vectors (ie vectors that are not columns in the passed data frame) to be either the same length as the data frame, or of length 1, you get an error.
To solve it, do your calculations directly on day2. You can explicitly remove NA values where they need to be removed, such as inside mean and sd
ggplot(dlf, aes(log1p(day2))) +
geom_histogram(aes(y=..density..), colour = "#76b81b", fill = "blue" ) +
stat_function(fun = dnorm,
args = list(mean = mean(log1p(dlf$day2), na.rm = TRUE),
sd = sd(log1p(dlf$day2), na.rm = TRUE)),
colour = "black", size = 2) +
labs( x = "Hygiene score day 2log", y = "Density day2log") +
theme_minimal()
