Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

How to take the mean of two subsequent rows iteratively thereby reducing the number of rows?

I have a tibble like so:

library(dplyr)
set.seed(1)
my_tib <- tibble(identifier = rep(letters[1:3], each = 4),
       year = rep(seq(2005, 2020, 5), 3),
       value = rnorm(12, mean = 1000, 100) %>% round()
       )

my_tib

# A tibble: 12 × 3
   identifier  year value
   <chr>      <dbl> <dbl>
 1 a           2005   937
 2 a           2010  1018
 3 a           2015   916
 4 a           2020  1160
 5 b           2005  1033
 6 b           2010   918
 7 b           2015  1049
 8 b           2020  1074
 9 c           2005  1058
10 c           2010   969
11 c           2015  1151
12 c           2020  1039

Now I’d like to shrink down my tibble by taking the mean value for two years each, creating a new column for the year bracket. For example, I’d like to take the mean of 937 and 1018 (977.5) for the new year_bracket 2005-2010.

I’d like to repeat this for all years and all identifiers.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

So the first new 5 rows of my tibble look like this:

head(my_new_tib, 5)

# A tibble: 9 × 3
   identifier  year_bracket  value       
   <chr>       <chr>         <dbl>
 1 a           2005-2010     977.5
 2 a           2010-2015     967
 3 a           2015-2020     1038
 4 b           2005-2010     975.5
 5 b           2010-2015     983.5

Ideally, I’m looking for a piped dplyr solution but I’m also curious regarding other solutions.

>Solution :

Using dplyr:

library(dplyr)
my_tib |> 
  group_by(identifier) |> 
  mutate(value = (value + lag(value))/2,
         year_bracket = paste0(lag(year)," - ",year),
         .keep = "unused",
         .before = 2) |> 
  filter(!is.na(value)) |> 
  ungroup()

Output:

# A tibble: 9 x 3
  identifier year_bracket value
  <chr>      <chr>        <dbl>
1 a          2005 - 2010   978.
2 a          2010 - 2015   967 
3 a          2015 - 2020  1038 
4 b          2005 - 2010   976.
5 b          2010 - 2015   984.
6 b          2015 - 2020  1062.
7 c          2005 - 2010  1014.
8 c          2010 - 2015  1060 
9 c          2015 - 2020  1095
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading