The question arised while playing around with coding options for Advent of Code 2023 day 9. Let’s say we have a list col object with tibbles in one column. Each of these tibbles has a variable aa. The aim is a new variable bb which is the value of aa when all aa values below are zeroes. From then on, bb is to be calculated as: aa + lead(bb); i.e. summing up aa from the current row with bb one row below.
Here is a reproducible example:
dat <- tibble(
V1 = sample(LETTERS, 3),
V2 = rnorm(3, 100, 50),
V3 = list(
tibble(var_not_used = sample(LETTERS, 5), aa = c(15, 3, 0, 0 ,0)),
tibble(var_not_used = sample(LETTERS, 5), aa = c(21, 6, 1, 0, 0)),
tibble(var_not_used = sample(LETTERS, 5), aa = c(45, 15, 6, 2, 0))
)
)
bb in tibble 1 should be 18, 3, 0, 0, 0; bb in tibble 2 should be 28, 7, 1, 0, 0; bb in tibble 3 should be 68, 23, 8, 2, 0. Below is an attempt/illustration in which direction it should go.
How could we code elegant to get only one column bb? I am especially interested in tidyverse solutions; unnesting is an option, but would be nice, if the solution works directly on the nested object.
dat <- tibble(
V1 = sample(LETTERS, 3),
V2 = rnorm(3, 100, 50),
V3 = list(
tibble(var_not_used = sample(LETTERS, 5), aa = c(15, 3, 0, 0 ,0)),
tibble(var_not_used = sample(LETTERS, 5), aa = c(21, 6, 1, 0, 0)),
tibble(var_not_used = sample(LETTERS, 5), aa = c(45, 15, 6, 2, 0))
)
) %>%
## from here on: this returns the data that we are interested in; however,
## its a lot of code repetition and many new columns. Moreover, in reality
## we might face situations in which we had to do way more than 4 of these
## mutate's
mutate(
V3 = map(V3, ~ .x %>%
mutate(bb1 = case_when(
lead(aa) == 0 ~ aa + lead(aa, default = 0)
))
)
) %>%
mutate(
V3 = map(V3, ~ .x %>%
mutate(bb2 = case_when(
is.na(bb1) ~ aa + lead(bb1, default = 0)
))
)
) %>%
mutate(
V3 = map(V3, ~ .x %>%
mutate(bb3 = case_when(
is.na(bb2) ~ aa + lead(bb2, default = 0)
))
)
) %>%
mutate(
V3 = map(V3, ~ .x %>%
mutate(bb4 = case_when(
is.na(bb3) ~ aa + lead(bb3, default = 0)
))
)
)
>Solution :
It looks like bb is just the reverse cumulative sum of aa, so we can do:
dat %>%
mutate(V3 = map(V3, ~ .x %>% mutate(bb = rev(cumsum(rev(aa))))))
We can see what this looks like by plucking out the list column:
dat %>%
mutate(V3 = map(V3, ~ .x %>% mutate(bb = rev(cumsum(rev(aa)))))) %>%
pluck('V3')
#> [[1]]
#> # A tibble: 5 x 3
#> var_not_used aa bb
#> <chr> <dbl> <dbl>
#> 1 W 15 18
#> 2 K 3 3
#> 3 G 0 0
#> 4 Q 0 0
#> 5 F 0 0
#>
#> [[2]]
#> # A tibble: 5 x 3
#> var_not_used aa bb
#> <chr> <dbl> <dbl>
#> 1 P 21 28
#> 2 O 6 7
#> 3 A 1 1
#> 4 W 0 0
#> 5 M 0 0
#>
#> [[3]]
#> # A tibble: 5 x 3
#> var_not_used aa bb
#> <chr> <dbl> <dbl>
#> 1 M 45 68
#> 2 B 15 23
#> 3 Y 6 8
#> 4 A 2 2
#> 5 T 0 0