I have a tibble like so:
library(dplyr)
set.seed(1)
my_tib <- tibble(identifier = rep(letters[1:3], each = 4),
year = rep(seq(2005, 2020, 5), 3),
value = rnorm(12, mean = 1000, 100) %>% round()
)
my_tib
# A tibble: 12 × 3
identifier year value
<chr> <dbl> <dbl>
1 a 2005 937
2 a 2010 1018
3 a 2015 916
4 a 2020 1160
5 b 2005 1033
6 b 2010 918
7 b 2015 1049
8 b 2020 1074
9 c 2005 1058
10 c 2010 969
11 c 2015 1151
12 c 2020 1039
Now I’d like to shrink down my tibble by taking the mean value for two years each, creating a new column for the year bracket. For example, I’d like to take the mean of 937 and 1018 (977.5) for the new year_bracket 2005-2010.
I’d like to repeat this for all years and all identifiers.
So the first new 5 rows of my tibble look like this:
head(my_new_tib, 5)
# A tibble: 9 × 3
identifier year_bracket value
<chr> <chr> <dbl>
1 a 2005-2010 977.5
2 a 2010-2015 967
3 a 2015-2020 1038
4 b 2005-2010 975.5
5 b 2010-2015 983.5
Ideally, I’m looking for a piped dplyr solution but I’m also curious regarding other solutions.
>Solution :
Using dplyr:
library(dplyr)
my_tib |>
group_by(identifier) |>
mutate(value = (value + lag(value))/2,
year_bracket = paste0(lag(year)," - ",year),
.keep = "unused",
.before = 2) |>
filter(!is.na(value)) |>
ungroup()
Output:
# A tibble: 9 x 3
identifier year_bracket value
<chr> <chr> <dbl>
1 a 2005 - 2010 978.
2 a 2010 - 2015 967
3 a 2015 - 2020 1038
4 b 2005 - 2010 976.
5 b 2010 - 2015 984.
6 b 2015 - 2020 1062.
7 c 2005 - 2010 1014.
8 c 2010 - 2015 1060
9 c 2015 - 2020 1095