I want to achieve something similar to the excel sumif. It is quite easy if the condition is equal to something, but not for greater or smaller because then we can’t use the group_by() approach. For different reasons I can’t use arrange() and then using cumsum(). The data in the example is ordered, but it is not always ordered.
I am giving an small example, in reality I have more conditions:
data <- data.frame(a = c(2022, 2021, 2020), b = c(10, 20, 10))
get_sum <- function(x, year){
out <- filter(x, b <= year) %>%
summarise(cum = sum(b)) %>%
pull()
return(out)
}
data %>%
mutate(cum = get_sum(x = ., year = b))
Steps: for the first row mutate, it filters the whole data.frame by values equal or less than 2022, then summarize it by making the sum of the filtered data.frame column b: 10 + 20 + 10.
for the second row do the same and now filter the value of b equal or smaller than 2021 being the mutate output: 20 + 10, and so on.
The output should look something like this:
a b cum
1 2022 10 40
2 2021 20 30
3 2020 10 10
I would like to know why doesn’t work my code.
>Solution :
We could use map from purrr if we want to stay within tidyverse. We need map to input individual values of a into the function. In your small example, i.e.
library(dplyr)
library(purrr)
data |>
mutate(cum = map(a, ~ sum(b[a <= .])))
Update. To make your own function work you could apply a similar logic using map, using cur_data() and fix a small typo within get_sum (a <= year rather than b <= year):
get_sum <- function(df, year) {
df |>
filter(a <= year) |>
summarise(cum = sum(b)) |>
pull()
}
data |>
mutate(cum = map(a, ~ get_sum(df = cur_data(), year = .)))
Output:
a b cum
1 2022 10 40
2 2021 20 30
3 2020 10 10