I have a dataframe in R as follows:
ID Group Chr
S1 Case amt5:8
S2 Case amt5:9
S3 FC amt5:8
S4 PC amt5:8
S5 FC amt5:9
S6 Case nhtf:56
S7 FC nhtf:56
S8 Case klju:78
S9 PC klju:78
S28 Case kljik:098
S67 PC hyjfk
S34 FC lkoj
I want to remove all duplicated values from df$Group column with value "PC" which have same values for "Case" and "FC" in df$Chr column.
the expected output will be:
ID Group Chr
S2 Case amt5:9
S5 FC amt5:9
S6 Case nhtf:56
S7 FC nhtf:56
S28 Case kljik:098
S67 PC hyjfk
S34 FC lkoj
The input df is:
dput(df)
structure(list(ID = c("S1", "S2", "S3", "S4", "S5", "S6", "S7",
"S8", "S9", "S28", "S67", "S34"), Group = c("Case", "Case", "FC",
"PC", "FC", "Case", "FC", "Case", "PC", "Case", "PC", "FC"),
Chr = c("amt5:8", "amt5:9", "amt5:8", "amt5:8", "amt5:9",
"nhtf:56", "nhtf:56", "klju:78", "klju:78", "kljik:098",
"hyjfk", "lkoj")), class = "data.frame", row.names = c(NA,
-12L))
>Solution :
You can first filter out records with Group == "PC", get the Chr of these rows. Then use another filter to remove these Chr if there’re more than one (duplicated), and at the same time keep records that only have one entry (non-duplicated).
library(dplyr)
df |>
filter(!Chr %in% (df |> filter(Group == "PC") |> pull(Chr)) & n() > 1 |
n() == 1,
.by = Chr)
ID Group Chr
1 S2 Case amt5:9
2 S5 FC amt5:9
3 S6 Case nhtf:56
4 S7 FC nhtf:56
5 S28 Case kljik:098
6 S67 PC hyjfk
7 S34 FC lkojb