Extract row that immediately precedes new instance of grouping variable

From this type of data:

df <- data.frame(
  IPU_id = 1:9,
  Sequ = c(NA,NA,1,1,2,2,NA,3,3),
  Q = c(NA,NA,"q_wh","q_wh","q_wh","q_wh",NA,"q_pol","q_pol"),
  N_extension = c(NA,NA,0,NA,1,NA,NA,0,NA)
)

I’d like to extract for each distinct Sequ the immediately preceding row regardless of whether Sequ in that row is.na(Sequ) or a positive Sequ value. The desired result is this:

df
  IPU_id Sequ     Q N_extension
2      2   NA  <NA>          NA
4      4    1  q_wh          NA
7      7   NA  <NA>          NA

>Solution :

Basically you need to find the rows where the following row has a different value of Sequ ((Sequ != lead(Sequ)), including the case where Sequ is NA but not the following row (is.na(Sequ) & !is.na(lead(Sequ))):

library(dplyr)

df <- data.frame(
  IPU_id = 1:9,
  Sequ = c(NA,NA,1,1,2,2,NA,3,3),
  Q = c(NA,NA,"q_wh","q_wh","q_wh","q_wh",NA,"q_pol","q_pol"),
  N_extension = c(NA,NA,0,NA,1,NA,NA,0,NA)
)

df |> 
  filter(
    (Sequ != lead(Sequ)) | 
      (is.na(Sequ) & !is.na(lead(Sequ)))
  )
#>   IPU_id Sequ    Q N_extension
#> 1      2   NA <NA>          NA
#> 2      4    1 q_wh          NA
#> 3      7   NA <NA>          NA

Created on 2023-05-01 with reprex v2.0.2

Leave a Reply