Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Using mutate to get an average of the previous weeks temperature

I’m looking for a way to create a new variable for an average of the temperature the week before.

I’ve got a data frame that looks like the following

weather_data

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

Rows: 3,664
Columns: 2
$ dt                  <date> 2014-01-01, 2014-01-02, 2014-01-03, 2014-01-04, 2014-01…
$ temp                <dbl> 6.390000, 6.234167, 6.307500, 4.436250, 4.432917, 8.4508…

What I’m hoping to do is get a new column called prev.temp that gives the average temperature for the previous week.

I know how to use mutate to create the new column but am struggling with the referencing. I can’t group_by because the "week prior" is a rolling thing but I’m struggling with trying to reference the date for the observation and also the 7 dates of information I need.

My first thought was something like this: mutate(prev.temp = casewhen(date > date – 7, mean(temp))) but I’m sure its not right.

>Solution :

Two options:

  1. A "rolling window" works well when you are "guaranteed" to have no gaps in your dates. One assumption here is that 1-row is always 1 day, so we rely on that being the case. Another assumption is that the data is pre-ordered by dt.

    We’ll use zoo::rollapplyr here, width of 7. partial= means that during the first week of data, we’ll still average those observations before today.

    zoo::rollapplyr(weather_data$temp, 7, FUN = mean, na.rm = TRUE, partial = TRUE)
    #  [1] 9.233254 9.333467 7.414063 7.679054 7.498385 7.193965 7.256155 6.253119 5.893012 6.431630 5.952437 6.051908 6.586221 5.967587 6.388821 6.752706 7.103914
    # [18] 6.666444 6.352582 5.871288 6.705206 6.289171 6.352013 6.311438 6.266374 6.316793 6.098056 6.100250 6.496584 6.300014 6.031206 6.968000 6.805867 7.185110
    # [35] 6.025666 6.521882 5.456449 4.775102 4.897948 5.185524 4.792596 5.347797 4.325031 5.567295 5.855414 5.920953 6.275780 6.610605 7.298713 8.046237 7.223235
    

    Verifying the data, the first week of data should be in the 7th position of this vector (it is):

    mean(weather_data$temp[1:7])
    # [1] 7.256155
    

    We need to lag this by one back into the data, so we can do

    weather_data %>%
      mutate(prev.temp = lag(zoo::rollapplyr(temp, 7, FUN = mean, na.rm = TRUE, partial = TRUE)))
    # # A tibble: 51 × 3
    #    dt          temp prev.temp
    #    <date>     <dbl>     <dbl>
    #  1 2023-11-29  9.23     NA   
    #  2 2023-11-30  9.43      9.23
    #  3 2023-12-01  3.58      9.33
    #  4 2023-12-02  8.47      7.41
    #  5 2023-12-03  6.78      7.68
    #  6 2023-12-04  5.67      7.50
    #  7 2023-12-05  7.63      7.19
    #  8 2023-12-06  2.21      7.26
    #  9 2023-12-07  6.91      6.25
    # 10 2023-12-08  7.35      5.89
    # # ℹ 41 more rows
    # # ℹ Use `print(n = ...)` to see more rows
    

    If you think you may have "gappy data", then all you need to do is interpolate the missing dates, give them a temp of NA, arrange by date, and then we are back to business with "guaranteed no gaps".

    weather_data_gapped <- weather_data[-c(4, 8),]
    head(weather_data_gapped, 10)
    # # A tibble: 10 × 2
    #    dt          temp
    #    <date>     <dbl>
    #  1 2023-11-29  9.23
    #  2 2023-11-30  9.43
    #  3 2023-12-01  3.58
    #  4 2023-12-03  6.78
    #  5 2023-12-04  5.67
    #  6 2023-12-05  7.63
    #  7 2023-12-07  6.91
    #  8 2023-12-08  7.35
    #  9 2023-12-09  5.12
    # 10 2023-12-10  7.47
    weather_data_gapped %>%
      reframe(dt = seq(min(dt), max(dt), by="day")) %>%
      left_join(weather_data_gapped, by = "dt")
    # # A tibble: 51 × 2
    #    dt          temp
    #    <date>     <dbl>
    #  1 2023-11-29  9.23
    #  2 2023-11-30  9.43
    #  3 2023-12-01  3.58
    #  4 2023-12-02 NA   
    #  5 2023-12-03  6.78
    #  6 2023-12-04  5.67
    #  7 2023-12-05  7.63
    #  8 2023-12-06 NA   
    #  9 2023-12-07  6.91
    # 10 2023-12-08  7.35
    # # ℹ 41 more rows
    # # ℹ Use `print(n = ...)` to see more rows
    

    (… and then do the rollapply thing as above.)

  2. A non-equi join also works, perhaps more akin to your case_when thought.

    weather_data %>%
      mutate(dt0 = dt, dt_from = dt - 8, dt_to = dt - 1, temp) %>%
      left_join(weather_data, join_by(between(y$dt, x$dt_from, x$dt_to)), suffix = c("", ".prev")) %>%
      summarize(prev.temp = mean(temp.prev, na.rm = TRUE), .by = c(dt0, temp)) %>%
      rename(dt = dt0)
    # # A tibble: 51 × 3
    #    dt          temp prev.temp
    #    <date>     <dbl>     <dbl>
    #  1 2023-11-29  9.23    NaN   
    #  2 2023-11-30  9.43      9.23
    #  3 2023-12-01  3.58      9.33
    #  4 2023-12-02  8.47      7.41
    #  5 2023-12-03  6.78      7.68
    #  6 2023-12-04  5.67      7.50
    #  7 2023-12-05  7.63      7.19
    #  8 2023-12-06  2.21      7.26
    #  9 2023-12-07  6.91      6.63
    # 10 2023-12-08  7.35      6.34
    # # ℹ 41 more rows
    # # ℹ Use `print(n = ...)` to see more rows
    

    This method is resilient to gapped data as well as out-of-order data.


Sample data

set.seed(42)
weather_data <- tibble(dt = Sys.Date() - 50:0, temp = runif(51, 1, 10))
head(weather_data, 10)
# # A tibble: 10 × 2
#    dt          temp
#    <date>     <dbl>
#  1 2023-11-29  9.23
#  2 2023-11-30  9.43
#  3 2023-12-01  3.58
#  4 2023-12-02  8.47
#  5 2023-12-03  6.78
#  6 2023-12-04  5.67
#  7 2023-12-05  7.63
#  8 2023-12-06  2.21
#  9 2023-12-07  6.91
# 10 2023-12-08  7.35
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading