Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Create a new column with averages for time intervals

I have a dataset that begins in year 1988 and ends in year 2020. I want to create averages for certain time intervals. For example, 5 years: 1988-1992, 1993-1997 and so on. But I want a new collumn with these averages.

For example, suppose I have this:

anos <- 1988:2020
valores <- c(15, 18, 20, NA, 25, 27, 28, NA, 32, 35, 36, 38, 40, 
             42, 45, 46, NA, 50, 52, 55, 56, 58, 60, NA, 65, 66, 
             68, 70, 72, 75, 76, 78, 80)

dataset <- data.frame(Ano = anos, Valor = valores)

I want to have this:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

anos <- 1988:2020

valores <- c(15, 18, 20, NA, 25, 27, 28, NA, 32, 35, 36, 38, 40, 
             42, 45, 46, NA, 50, 52, 55, 56, 58, 60, NA, 65, 66, 
             68, 70, 72, 75, 76, 78, 80)

medias <- c(19.5, 19.5, 19.5, 19.5, 19.5,
            30.5, 30.5, 30.5, 30.5, 30.5,
            40.2, 40.2, 40.2, 40.2, 40.2,
            50.75, 50.75, 50.75, 50.75, 50.75,
            59.75, 59.75, 59.75, 59.75, 59.75,
            70.2, 70.2, 70.2, 70.2, 70.2,
            78, 78, 78)
            
dataset <- data.frame(Ano = anos, Valor = valores, Medias = medias)

There are five year averages for each time interval and the mean value repeats itself until the next five years. The last mean is an average of 3 values since the overall period is not a multiple of five.

EDIT: also, I will need to use group_by().

>Solution :

Here’s a dplyr version :

library(dplyr)
n <- 5

dataset %>%
  arrange(Ano) %>%
  group_by(Group = floor((Ano - first(Ano))/n)) %>%
  mutate(Medias = mean(Valor, na.rm = TRUE)) %>%
  data.frame()

which returns the following –

#    Ano Valor Group Medias
#1  1988    15     0  19.50
#2  1989    18     0  19.50
#3  1990    20     0  19.50
#4  1991    NA     0  19.50
#5  1992    25     0  19.50
#6  1993    27     1  30.50
#7  1994    28     1  30.50
#8  1995    NA     1  30.50
#9  1996    32     1  30.50
#10 1997    35     1  30.50
#11 1998    36     2  40.20
#12 1999    38     2  40.20
#13 2000    40     2  40.20
#14 2001    42     2  40.20
#15 2002    45     2  40.20
#16 2003    46     3  50.75
#17 2004    NA     3  50.75
#18 2005    50     3  50.75
#19 2006    52     3  50.75
#20 2007    55     3  50.75
#21 2008    56     4  59.75
#22 2009    58     4  59.75
#23 2010    60     4  59.75
#24 2011    NA     4  59.75
#25 2012    65     4  59.75
#26 2013    66     5  70.20
#27 2014    68     5  70.20
#28 2015    70     5  70.20
#29 2016    72     5  70.20
#30 2017    75     5  70.20
#31 2018    76     6  78.00
#32 2019    78     6  78.00
#33 2020    80     6  78.00

In the example that you have shared you have a single entry for each year and data for all the year is present however, this will also work when :

  1. You have more than 1 rows for a year.
  2. You don’t have data for all the years.
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading