Filter for non-repeated values based on Unique ID

February 4, 2022

I’ve the following table

S/N	Unique ID	Code
1	111	YES
2	111	YES
3	111	NO
4	111	YES
5	222	YES
6	222	YES
7	222	YES
8	222	YES
9	333	NO
10	333	NO
11	333	YES
12	333	YES

How do I derive the following table based on the following conditions:
For each unique ID, if YES repeats, keep the first YES. If NO Appears, keep the following YES. I tried using mutate and it’s giving me all sort of errors.

S/N	Unique ID	Code
1	111	YES
4	111	YES
5	222	YES
11	333	YES

Thanks!

>Solution :

base R

ind <- ave(dat$Code == "YES", dat$`Unique ID`,
           FUN = function(z) z & c(TRUE, !z[-length(z)]))
dat[ind,]
#    S/N Unique ID Code
# 1    1       111  YES
# 4    4       111  YES
# 5    5       222  YES
# 11  11       333  YES

dplyr

library(dplyr)
dat %>%
  group_by(`Unique ID`) %>%
  filter(Code == "YES" & lag(Code == "NO", default = TRUE)) %>%
  ungroup()
# # A tibble: 4 x 3
#   `S/N` `Unique ID` Code 
#   <int>       <int> <chr>
# 1     1         111 YES  
# 2     4         111 YES  
# 3     5         222 YES  
# 4    11         333 YES

data.table

library(data.table)
as.data.table(dat)[, .SD[Code == "YES" & shift(Code == "NO", fill = TRUE),], by = `Unique ID`]
#    Unique ID   S/N   Code
#        <int> <int> <char>
# 1:       111     1    YES
# 2:       111     4    YES
# 3:       222     5    YES
# 4:       333    11    YES

Data

dat <- structure(list("S/N" = 1:12, "Unique ID" = c(111L, 111L, 111L, 111L, 222L, 222L, 222L, 222L, 333L, 333L, 333L, 333L), Code = c("YES", "YES", "NO", "YES", "YES", "YES", "YES", "YES", "NO", "NO", "YES", "YES")), class = "data.frame", row.names = c(NA, -12L))