Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

R – Fill in missing values, with the previous value in the column, times another column, and iterate

I have a time series where I need to calculate a counterfactual value based based on a column of historical growth rates. The counterfactual column is the exact same as the the observed_value column up until March of 2020, where the values need to be recomputed using historical growth rates from the growth_rate column. The idea is to fill in all the missing values after Feb 2020, by multiplying the value of the previous month with the historical growth rate. Here is the data

df <- data.frame("month" = c("Jan", "Feb", "March", "April"),
                 "year" = c(2020, 2020, 2020, 2020),
                 "observed_value" = c(15,18,22,19),
                 "growth_rate" = c(0.001, -0.549, 0.4788, -0.11),
                 "counterfactual" = c(15,18,NA,NA))

So in this example, we fill in the counterfactual value for March 2020 by multiplying the Feb 2020 value * (1+growth rate) in the row we are filling in. In this example, the calculation for the missing value for March 2020 = 18*(1+0.4788) = 26.62. And now we iterate and using the counterfactual March 2020 value we computed to calculate the counterfactual for April 2020. For example April 2020 = 26.62 * 0.89….and so on until we have a full counterfactual column. Note, all values after Feb 2020 in the counterfactual column are NA.

desired_output <- data.frame("month" = c("Jan", "Feb", "March", "April"),
                          "year" = c(2020, 2020, 2020, 2020),
                          "observed_value" = c(15,18,22,19),
                          "growth_rate" = c(0.001, -0.549, 0.4788, -0.11),
                          "counterfactual" = c(15,18,26.62,23.69))

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

>Solution :

If, as you observe, all values after a date are NA, you can do this:


miss <- which(is.na(df$counterfactual))

df$counterfactual[miss] <- 
  tail(df$observed_value[-miss], 1) * 
  cumprod(1 + df$growth_rate[miss])

df
#>   month year observed_value growth_rate counterfactual
#> 1   Jan 2020             15      0.0010       15.00000
#> 2   Feb 2020             18     -0.5490       18.00000
#> 3 March 2020             22      0.4788       26.61840
#> 4 April 2020             19     -0.1100       23.69038

Note that in your example 0.89 is wrong.

If not all NA’s are contiguous, this solution does not apply and you must iterate or group by NA’s groups.

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading