R how to identify duplicate rows in all(multiple columns) and or(multiple columns)?

I want to identify duplicate rows in a data frame based on two types of conditions:

1: all(multiple columns), all the elements in the multiple columns should be the same.

2: any(multiple columns), if only one of the elements in the multiple columns is the same, then they are considered as replicates.

getRepplicate <- function(df, allCol = "", anyCol = "") {
# both condition 1 and condition 2 are fit, then they are considered as replicated rows
}

For example:

df <- data.frame(
  a = c(1, 1, 2, 3, 4, 1, 1, 3),
  b = c(1, 2, 2, 3, 4, 1, 1, 3), 
  d = c("x", "y", "z", "x", "x", "y", "x", "x"),
  e = c("x", "y", "z", "x", "x", "x", "z", "x")
)
> df
  a b d e
1 1 1 x x
2 1 2 y y
3 2 2 z z
4 3 3 x x
5 4 4 x x
6 1 1 y x
7 1 1 x z
8 3 3 x x

If I apply this function df2 <- getRepplicate(df, allCol = c("a", "b"), anyCol = c("d", "e")), my expected result will be:

> df2
  a b d e isReplicate
1 1 1 x x TRUE
2 1 2 y y FALSE
3 2 2 z z FALSE
4 3 3 x x TRUE
5 4 4 x x FALSE
6 1 1 y x TRUE
7 1 1 x z TRUE
8 3 3 x x TRUE

Thanks for your help.

>Solution :

Is something like this?

  • In allCond we check if the number of unique values is one, so all values are the same
  • In anyCond we check if the number of unique values is equal to the number of values in the row
  • If both conditions are TRUE then it is replicated.
library(dplyr)

df <- data.frame(
  a = c(1, 1, 2, 3, 4, 1, 1, 3,1),
  b = c(1, 2, 2, 3, 4, 1, 1, 3,2), 
  d = c("x", "y", "z", "x", "x", "y", "x", "x","x"),
  e = c("x", "y", "z", "x", "x", "x", "z", "x","y")
)

getRepplicate <- function(df, allCol, anyCol) {
    df %>% 
    rowwise() %>% 
    mutate(
      allCond = n_distinct(c_across(all_of(allCol))) == 1 ,
      anyCond = n_distinct(c_across(all_of(anyCol))) < length(c_across(all_of(anyCol))) ) %>%
    ungroup() %>% 
    mutate(isReplicated = allCond & anyCond)
}
getRepplicate(df, allCol = c("a", "b"), anyCol = c("d", "e"))
#> # A tibble: 9 × 7
#>       a     b d     e     allCond anyCond isReplicated
#>   <dbl> <dbl> <chr> <chr> <lgl>   <lgl>   <lgl>       
#> 1     1     1 x     x     TRUE    TRUE    TRUE        
#> 2     1     2 y     y     FALSE   TRUE    FALSE
#> 3     2     2 z     z     TRUE    TRUE    TRUE        
#> 4     3     3 x     x     TRUE    TRUE    TRUE        
#> 5     4     4 x     x     TRUE    TRUE    TRUE        
#> 6     1     1 y     x     TRUE    FALSE   FALSE
#> 7     1     1 x     z     TRUE    FALSE   FALSE
#> 8     3     3 x     x     TRUE    TRUE    TRUE        
#> 9     1     2 x     y     FALSE   FALSE   FALSE

Created on 2023-02-10 with reprex v2.0.2

Leave a Reply