# How can I "weight" data points before making a density plot in R?

Let’s say I have some data in a tibble `activity` with a column `activity\$time` that records the time of day of some events. Suppose this data consists of two different sampling periods, one from 5:00 to 9:00, and one from 7:00 to 11:00. Because these periods overlap, events between 7:00 and 9:00 are over-represented by a factor of 2 compared to the rest. If I were to make a density plot like this:

``````ggplot(activity) + geom_density(mapping = aes(x = time))
``````

then the center would be skewed upwards compared to what would be a true reflection of reality. How can I tell `geom_density()` something like "weight this interval by a factor of 0.5", or better yet, provide an arbitrary weighting function?

Here is some code demonstrating the overlap effect. `runif()` should produce a uniform distribution, but because I have two overlapping sections, there is a higher plateau in the middle:

``````set.seed(27036459)
activity <- tibble(time = c(runif(10000, 5, 9), runif(10000, 7, 11)))
ggplot(activity) + geom_density(mapping = aes(x = time))
``````

What I want is a way to take `activity`, and using my knowledge of the sampling intervals, somehow adjust the graph to represent the actual distribution of the phenomenon, independent of sampling bias (in this case, the uniformity of `runif()`).

### >Solution :

We can produce a set-up similar to your own by taking 50 samples from the period 5am to 9am and another 50 samples from 7am to 11am like so:

``````set.seed(1)

activity <- data.frame(time = as.POSIXct("2022-08-05 05:00:00") +
c(runif(50, 0, 14400), c(runif(50, 7200, 21600))))
``````

And we can see this produces the unwanted peak between 7am and 9am:

``````library(tidyverse)

ggplot(activity) +
geom_density(mapping = aes(x = time))
`````` There is no `weights` argument in `geom_density`, but since the area under the curve is normalized to one, it doesn’t matter whether we half the weight of values between 7 and 9, or double the weights outside this period – it would give us the same result. The latter is much easier to do however: we just create a copy of the data frame in which we filter out the values between 7 and 9, then bind this to the original data frame:

``````library(lubridate)

activity %>%
filter(hour(time) < 7 | hour(time) > 9) %>%
bind_rows(activity) %>%
ggplot() +
geom_density(mapping = aes(x = time))
`````` Created on 2022-08-05 by the reprex package (v2.0.1)