How to remove all duplicated values based on multiple column values?

June 18, 2024

I have a dataframe in R as follows:

ID  Group   Chr
S1  Case    amt5:8
S2  Case    amt5:9
S3  FC      amt5:8
S4  PC      amt5:8
S5  FC      amt5:9
S6  Case    nhtf:56
S7  FC      nhtf:56
S8  Case    klju:78
S9  PC      klju:78
S28 Case    kljik:098
S67 PC      hyjfk
S34 FC      lkoj

I want to remove all duplicated values from df$Group column with value "PC" which have same values for "Case" and "FC" in df$Chr column.
the expected output will be:

ID  Group   Chr
S2  Case    amt5:9
S5  FC      amt5:9
S6  Case    nhtf:56
S7  FC      nhtf:56
S28 Case    kljik:098
S67 PC      hyjfk
S34 FC      lkoj

The input df is:

dput(df)

structure(list(ID = c("S1", "S2", "S3", "S4", "S5", "S6", "S7", 
"S8", "S9", "S28", "S67", "S34"), Group = c("Case", "Case", "FC", 
"PC", "FC", "Case", "FC", "Case", "PC", "Case", "PC", "FC"), 
    Chr = c("amt5:8", "amt5:9", "amt5:8", "amt5:8", "amt5:9", 
    "nhtf:56", "nhtf:56", "klju:78", "klju:78", "kljik:098", 
    "hyjfk", "lkoj")), class = "data.frame", row.names = c(NA, 
-12L))

>Solution :

You can first filter out records with Group == "PC", get the Chr of these rows. Then use another filter to remove these Chr if there’re more than one (duplicated), and at the same time keep records that only have one entry (non-duplicated).

library(dplyr)

df |> 
  filter(!Chr %in% (df |> filter(Group == "PC") |> pull(Chr)) & n() > 1 | 
           n() == 1,
         .by = Chr)

   ID Group       Chr
1  S2  Case    amt5:9
2  S5    FC    amt5:9
3  S6  Case   nhtf:56
4  S7    FC   nhtf:56
5 S28  Case kljik:098
6 S67    PC     hyjfk
7 S34    FC     lkojb