I have a dataset that looks similar to this:
> dput(df)
structure(list(Date = c("3/23/21", "4/11/22", "6/30/22"), Banana_wasted = c(4L,
2L, 5L), Apple_wasted = c(6L, 0L, 3L), Orange_wasted = c(1L,
4L, 1L), Banana_ordered = c(5L, 7L, 7L), Apple_Ordered = c(9L,
8L, 9L), Orange_ordered = c(5L, 6L, 6L), Banana_eaten = c(5L,
5L, 6L), Apple_eaten = c(7L, 7L, 4L), Orange_eaten = c(8L, 8L,
8L)), class = "data.frame", row.names = c(NA, -3L))
I want to find the % of fruit wasted per month/year (in relation to how many fruits were ordered).
it should be:
(Banana_wasted+Apple_wasted+Orange_wasted) / (Banana_ordered + Apple_ordered+ Orange_ordered)
So, for 3/21, it should be:
(4+6+1/5+9+5)*100 = 57.9%
I would like to do this for every month of the year.
>Solution :
library(dplyr)
library(lubridate)
df %>%
mutate(Date = as.Date(Date, format = "%m/%d/%y"),
pct_wasted = (Banana_wasted + Apple_wasted + Orange_wasted) / (Banana_ordered + Apple_Ordered + Orange_ordered) * 100) %>%
group_by(year = year(Date), month = month(Date)) %>%
summarize(avg_pct_wasted = mean(pct_wasted))
#> # A tibble: 3 × 3
#> # Groups: year [2]
#> year month avg_pct_wasted
#> <dbl> <dbl> <dbl>
#> 1 2021 3 57.9
#> 2 2022 4 28.6
#> 3 2022 6 40.9
Created on 2023-02-06 with reprex v2.0.2