Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Cumulative sum per group in tidyverse R

I have paneldata, i.e. repeated observations per household. A unit (household) is measured through time and exhibits a characteric (e.g. variable). I can calculate a group sum per year with group_by(id, year). How can I have a cummulative sum over time as in the goal column? I need the result to preserve 10 rows in this example, i.e. not collapse data to the years. How can I pick just one group-sum per year per unit to sum up?

set.seed(1234)
data <- data.frame(id = rep(100, 10),
                   year = c(rep(2022, 5), rep(2023, 5)),
                   variable = rbinom(10, 1, 0.5))

library(tidyverse)

data <- data %>% 
  group_by(id, year) %>% 
  mutate(group_sum_per_year = sum(variable)) 

data$goal <- c(4,4,4,4,4,7,7,7,7,7)

data

# A tibble: 10 × 5
# Groups:   id, year [2]
      id  year variable group_sum_per_year  goal
   <dbl> <dbl>    <int>              <int> <dbl>
 1   100  2022        0                  4     4
 2   100  2022        1                  4     4
 3   100  2022        1                  4     4
 4   100  2022        1                  4     4
 5   100  2022        1                  4     4
 6   100  2023        1                  3     7
 7   100  2023        0                  3     7
 8   100  2023        0                  3     7
 9   100  2023        1                  3     7
10   100  2023        1                  3     7

>Solution :

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

You could first create a tempory column hlp which is equal to group_sum_per_year for only the first entry per group.

Then you could group by id and use cumsum on hlp:

data %>% 
  group_by(id, year) %>% 
  mutate(group_sum_per_year = sum(variable)) %>% 
  mutate(hlp = if_else(1:n() == 1, group_sum_per_year, 0)) %>%
  group_by(id) %>%
  mutate(goal = cumsum(hlp))

# A tibble: 10 × 6
# Groups:   id [1]
      id  year variable group_sum_per_year   hlp  goal
   <dbl> <dbl>    <int>              <int> <dbl> <dbl>
 1   100  2022        0                  4     4     4
 2   100  2022        1                  4     0     4
 3   100  2022        1                  4     0     4
 4   100  2022        1                  4     0     4
 5   100  2022        1                  4     0     4
 6   100  2023        1                  3     3     7
 7   100  2023        0                  3     0     7
 8   100  2023        0                  3     0     7
 9   100  2023        1                  3     0     7
10   100  2023        1                  3     0     7
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading