Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Extract row that immediately precedes new instance of grouping variable

From this type of data:

df <- data.frame(
  IPU_id = 1:9,
  Sequ = c(NA,NA,1,1,2,2,NA,3,3),
  Q = c(NA,NA,"q_wh","q_wh","q_wh","q_wh",NA,"q_pol","q_pol"),
  N_extension = c(NA,NA,0,NA,1,NA,NA,0,NA)
)

I’d like to extract for each distinct Sequ the immediately preceding row regardless of whether Sequ in that row is.na(Sequ) or a positive Sequ value. The desired result is this:

df
  IPU_id Sequ     Q N_extension
2      2   NA  <NA>          NA
4      4    1  q_wh          NA
7      7   NA  <NA>          NA

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

>Solution :

Basically you need to find the rows where the following row has a different value of Sequ ((Sequ != lead(Sequ)), including the case where Sequ is NA but not the following row (is.na(Sequ) & !is.na(lead(Sequ))):

library(dplyr)

df <- data.frame(
  IPU_id = 1:9,
  Sequ = c(NA,NA,1,1,2,2,NA,3,3),
  Q = c(NA,NA,"q_wh","q_wh","q_wh","q_wh",NA,"q_pol","q_pol"),
  N_extension = c(NA,NA,0,NA,1,NA,NA,0,NA)
)

df |> 
  filter(
    (Sequ != lead(Sequ)) | 
      (is.na(Sequ) & !is.na(lead(Sequ)))
  )
#>   IPU_id Sequ    Q N_extension
#> 1      2   NA <NA>          NA
#> 2      4    1 q_wh          NA
#> 3      7   NA <NA>          NA

Created on 2023-05-01 with reprex v2.0.2

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading