Splitting a data.frame by sequences of NA values

May 5, 2023

I would like to split a data.frame by sequences of NA values. The original data.frame looks like this example:

data <- data.frame(temp1=c(2,5,8,NA,NA,NA,7,4,1,3,NA,NA,1,5,NA,NA,NA,NA,9),temp2=c(1:19))

and I would like to get a list of data frames with only the sequences of NA values:

result <- list(data.frame(temp1=c(NA,NA,NA),temp2=c(4,5,6)),data.frame(temp1=c(NA,NA),temp2=c(11,12)),data.frame(temp1=c(NA,NA,NA,NA),temp2=c(15,16,17,18)))

so that I can work on each sequence independently (each sequence is a specific case).

(e.g like a subset(data,is.na(data$temp1)) but with separated sequences of NA)

>Solution :

An one-liner using split would be:

# library(data.table) for the rleid() function
# interchangeable with dplyr::consecutive_id()
library(data.table) 

split(data, ifelse(is.na(data$temp1), rleid(data$temp1), NA))

$`4`
  temp1 temp2
4    NA     4
5    NA     5
6    NA     6

$`9`
   temp1 temp2
11    NA    11
12    NA    12

$`12`
   temp1 temp2
15    NA    15
16    NA    16
17    NA    17
18    NA    18

Wrap the whole thing with unnamed() if you want to get rid of the list names.