I have a timeseries data with NA values in few specific column fields. I am getting an error while trying to replace NAs in column fields with interpolated values. Below is a sample code that I am trying to execute with the error message below. The first row has a value of 17.58 (temperature) and the next value is only at row 30 (2 minute time step) with a value of 16.58. This pattern of NAs and data availability extends for rest of the data.
library(zoo)
my_data[, 42] <- na.approx(my_data[, 42])
Error in [<-.data.frame(*tmp*, , 42, value = c(17.58, 17.5466666666667, :
replacement has 120726 rows, data has 132373
>Solution :
It is specified in the ?na.approx
An object of similar structure as object with NAs replaced by interpolation. For na.approx only the internal NAs are replaced and leading or trailing NAs are omitted if na.rm = TRUE or not replaced if na.rm = FALSE.
By default, the na.approx uses na.rm = TRUE
na.approx(object, x = index(object), xout, …, na.rm = TRUE, maxgap = Inf, along)
Thus, we can change the code to
my_data[, 42] <- na.approx(my_data[, 42], na.rm = FALSE)
In a large dataset, it is possible to have leading/lagging NAs and using the OP’s code results in an output vector with less number of elements as na.rm = TRUE, which triggers the length difference error in replacement