I would like to split a data.frame by sequences of NA values. The original data.frame looks like this example:
data <- data.frame(temp1=c(2,5,8,NA,NA,NA,7,4,1,3,NA,NA,1,5,NA,NA,NA,NA,9),temp2=c(1:19))
and I would like to get a list of data frames with only the sequences of NA values:
result <- list(data.frame(temp1=c(NA,NA,NA),temp2=c(4,5,6)),data.frame(temp1=c(NA,NA),temp2=c(11,12)),data.frame(temp1=c(NA,NA,NA,NA),temp2=c(15,16,17,18)))
so that I can work on each sequence independently (each sequence is a specific case).
(e.g like a subset(data,is.na(data$temp1)) but with separated sequences of NA)
>Solution :
An one-liner using split would be:
# library(data.table) for the rleid() function
# interchangeable with dplyr::consecutive_id()
library(data.table)
split(data, ifelse(is.na(data$temp1), rleid(data$temp1), NA))
$`4`
temp1 temp2
4 NA 4
5 NA 5
6 NA 6
$`9`
temp1 temp2
11 NA 11
12 NA 12
$`12`
temp1 temp2
15 NA 15
16 NA 16
17 NA 17
18 NA 18
Wrap the whole thing with unnamed() if you want to get rid of the list names.