Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Conditional sum with mutate (dplyr)

I want to achieve something similar to the excel sumif. It is quite easy if the condition is equal to something, but not for greater or smaller because then we can’t use the group_by() approach. For different reasons I can’t use arrange() and then using cumsum(). The data in the example is ordered, but it is not always ordered.

I am giving an small example, in reality I have more conditions:

data <- data.frame(a = c(2022, 2021, 2020), b = c(10, 20, 10))

get_sum  <- function(x, year){
  out <- filter(x, b <= year) %>%
    summarise(cum = sum(b)) %>% 
    pull()
  return(out)
}


data %>% 
  mutate(cum = get_sum(x = ., year = b))

Steps: for the first row mutate, it filters the whole data.frame by values equal or less than 2022, then summarize it by making the sum of the filtered data.frame column b: 10 + 20 + 10.
for the second row do the same and now filter the value of b equal or smaller than 2021 being the mutate output: 20 + 10, and so on.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

The output should look something like this:

     a  b cum
1 2022 10  40
2 2021 20  30
3 2020 10  10

I would like to know why doesn’t work my code.

>Solution :

We could use map from purrr if we want to stay within tidyverse. We need map to input individual values of a into the function. In your small example, i.e.

library(dplyr)
library(purrr)

data |>
  mutate(cum = map(a, ~ sum(b[a <= .])))

Update. To make your own function work you could apply a similar logic using map, using cur_data() and fix a small typo within get_sum (a <= year rather than b <= year):

get_sum <- function(df, year) {

  df |>
    filter(a <= year) |>
    summarise(cum = sum(b)) |>
    pull()

}

data |>
  mutate(cum = map(a, ~ get_sum(df = cur_data(), year = .)))

Output:

     a  b cum
1 2022 10  40
2 2021 20  30
3 2020 10  10
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading