From this type of data:
df <- data.frame(
IPU_id = 1:9,
Sequ = c(NA,NA,1,1,2,2,NA,3,3),
Q = c(NA,NA,"q_wh","q_wh","q_wh","q_wh",NA,"q_pol","q_pol"),
N_extension = c(NA,NA,0,NA,1,NA,NA,0,NA)
)
I’d like to extract for each distinct Sequ
the immediately preceding row regardless of whether Sequ
in that row is.na(Sequ)
or a positive Sequ
value. The desired result is this:
df
IPU_id Sequ Q N_extension
2 2 NA <NA> NA
4 4 1 q_wh NA
7 7 NA <NA> NA
>Solution :
Basically you need to find the rows where the following row has a different value of Sequ
((Sequ != lead(Sequ)
), including the case where Sequ
is NA
but not the following row (is.na(Sequ) & !is.na(lead(Sequ))
):
library(dplyr)
df <- data.frame(
IPU_id = 1:9,
Sequ = c(NA,NA,1,1,2,2,NA,3,3),
Q = c(NA,NA,"q_wh","q_wh","q_wh","q_wh",NA,"q_pol","q_pol"),
N_extension = c(NA,NA,0,NA,1,NA,NA,0,NA)
)
df |>
filter(
(Sequ != lead(Sequ)) |
(is.na(Sequ) & !is.na(lead(Sequ)))
)
#> IPU_id Sequ Q N_extension
#> 1 2 NA <NA> NA
#> 2 4 1 q_wh NA
#> 3 7 NA <NA> NA
Created on 2023-05-01 with reprex v2.0.2