I am trying to create a new set of variables based on observations at 5 different time points. However, there is not an observation for each row at each time point. Assuming it looks something like this:
X1 <- c(NA,NA,7,8,1,5)
X2 <- c(NA,0,0,NA,3,7)
X3 <- c(NA,2,3,4,2,7)
X4 <- c(1,1,5,2,1,7)
X5 <- c(2,NA,NA,4,3,NA)
df <- data.frame(X1,X2,X3,X4,X5)
X1 X2 X3 X4 X5
1 NA NA NA 1 2
2 NA 0 2 1 NA
3 7 0 3 5 NA
4 8 NA 4 2 4
5 1 3 2 1 3
6 5 7 7 7 NA
I want to create 5 new variables, say T1 – T5 so that T1 is propagated with the first non-NA value in that row and then for each value following to remain the same.
X1 X2 X3 X4 X5 T1 T2 T3 T4 T5
1 NA NA NA 1 2 1 2 NA NA NA
2 NA 0 2 1 NA 0 2 1 NA NA
3 7 0 3 5 NA 7 0 3 5 NA
4 8 NA 4 2 4 8 NA 4 2 4
5 1 3 2 1 3 1 3 2 1 3
6 5 7 7 7 NA 5 7 7 7 NA
Any suggestions? Thank you in advance!
>Solution :
fun <- function(z) {
ind <- which.max(!is.na(z))
if (!length(ind)) ind <- 1;
c(z[ind:length(z)], if (ind > 1) z[1:(ind-1)])
}
cbind(df, setNames(as.data.frame(t(apply(df, 1, fun))), sub("^X", "T", names(df))))
# X1 X2 X3 X4 X5 T1 T2 T3 T4 T5
# 1 NA NA NA 1 2 1 2 NA NA NA
# 2 NA 0 2 1 NA 0 2 1 NA NA
# 3 7 0 3 5 NA 7 0 3 5 NA
# 4 8 NA 4 2 4 8 NA 4 2 4
# 5 1 3 2 1 3 1 3 2 1 3
# 6 5 7 7 7 NA 5 7 7 7 NA
Walkthrough:
- within
fun, thewhich.maxwill return the first non-NAwithin the vector (which will be a "row" within the frame); in a corner-case where all values areNA, it returnsinteger(0), so we need to verify its length before indexing the vector; apply(., 1, fun)convertsdfto a matrix, then applies the functionfunon each row;- since
apply(., 1, ..)returns a transposed matrix, wet(.)transpose it; - since that returns a matrix, we
as.data.frame(.)it, then change the column names withsetNamesandsub(.); - finally,
cbindit with the original data.