Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Weighted average each year – R

I am looking to calculate the average of the interest_rate for each year (2006 to 2023). I have a series of interest rate decision dates for the new rate and the date.

For years with multiple interest_rates during a year I would like to weight the average value by the proportion of year that at that particular rate.

Example data:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

library(data.table)
df = data.table(date = c("03/08/2006", "09/11/2006", "11/01/2007", "10/05/2007", "05/07/2007", "06/12/2007", "07/02/2008", "10/04/2008", "08/10/2008", "06/11/2008", "04/12/2008", "08/01/2009", "05/02/2009", "05/03/2009", "04/08/2016", "02/11/2017", "02/08/2018", "11/03/2020", "19/03/2020", "16/12/2021", "03/02/2022", "17/03/2022", "05/05/2022", "16/06/2022", "04/08/2022", "22/09/2022", "03/11/2022", "15/12/2022", "02/02/2023", "23/03/2023", "11/05/2023"),
                interest_rate = c(4.7500, 5.0000, 5.2500, 5.5000, 5.7500, 5.5000, 5.2500, 5.0000, 4.5000, 3.0000, 2.0000, 1.5000, 1.0000, 0.5000, 0.2500, 0.5000, 0.7500, 0.2500, 0.1000, 0.2500, 0.5000, 0.7500, 1.0000, 1.2500, 1.7500, 2.2500, 3.0000, 3.5000, 4.0000, 4.2500, 4.500))

The resultant data would look like:

year weighted_interest_rate
2006 4.7
2007
2008 etc

>Solution :

One method would be to create a data frame of the interest rates for each day in the entire series, group by year, then average the interest rate for each date in the year.

An important first step is converting the dates-as-strings to actual dates.

library(tidyverse)

df %>%
  mutate(date = as.Date(date, '%d/%m/%Y'),
         days = c(as.numeric(diff(date)), 0)) %>%
  rowwise() %>%
  reframe(date = seq(date, by = 'day', length.out = days),
          interest_rate = interest_rate,
          year = lubridate::year(date)) %>%
  group_by(year) %>%
  summarize(interest_rate = mean(interest_rate))
#> # A tibble: 18 x 2
#>     year interest_rate
#>    <dbl>         <dbl>
#>  1  2006         4.84 
#>  2  2007         5.51 
#>  3  2008         4.67 
#>  4  2009         0.644
#>  5  2010         0.5  
#>  6  2011         0.5  
#>  7  2012         0.5  
#>  8  2013         0.5  
#>  9  2014         0.5  
#> 10  2015         0.5  
#> 11  2016         0.398
#> 12  2017         0.291
#> 13  2018         0.604
#> 14  2019         0.75 
#> 15  2020         0.228
#> 16  2021         0.107
#> 17  2022         1.47 
#> 18  2023         3.97 
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading