How can you convert duplicates across multiple columns to be NA in R?

March 28, 2022

I have a dataset that I want to convert any duplicates across columns to be NA. I’ve found answers to help with just looking for duplicates in one column, and I’ve found ways to remove duplicates entirely (e.g., distinct()). Instead, I have this data:

library(dpylr)
test <- tibble(job = c(1:6), 
               name = c("j", "j", "j", "c", "c", "c"),
               id = c(1, 1, 2, 1, 5, 1))

And want this result:

library(dpylr)

answer <- tibble(job = c(1:6), 
                 id = c("j", NA, "j", "c", NA, "c"),
                  name = c(1, NA, 2, 1, NA, 5))

And I’ve tried a solution like this using duplicated(), but it fails:

#Attempted solution
library(dpylr)
test %>%
  mutate_at(vars(id, name), ~case_when(
    duplicated(id, name) ~ NA,
    TRUE ~ .
  ))

I’d prefer to use tidy solutions, but I can be flexible as long as the answer can be piped.

>Solution :

We could create a helper and then identify duplicates and replace them with NA in an ifelse statement using across:

library(dplyr)
test %>% 
  mutate(helper = paste(id, name)) %>% 
  mutate(across(c(name, id), ~ifelse(duplicated(helper), NA, .)), .keep="unused")

    job name     id
  <int> <chr> <dbl>
1     1 j         1
2     2 NA       NA
3     3 j         2
4     4 c         1
5     5 c         5
6     6 NA       NA