Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Remove 'duplicate' rows based on combinations in two columns (R)

I have this example data.frame:

 df1 <- data.frame(v1 = c('A', 'B', 'C', 'D', 'E', 'F', 'G', 'H'),
      v2 = c('B', 'A', 'D', 'C', 'F', 'E', 'H', 'G'),
      value = c(1.12, 1.12, 12.52, 12.52, 3.19, 3.19, 12.52, 12.52))
 > df1
   v1 v2 value
 1  A  B  1.12
 2  B  A  1.12
 3  C  D 12.52
 4  D  C 12.52
 5  E  F  3.19
 6  F  E  3.19
 7  G  H 12.52
 8  H  G 12.52

Combinations such as A and B in row 1 are the same to me as combinations such as B and A, where values in column value are also the same. How can I remove rows which for my purpose are duplicates?

Expected result:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

 df2 <- data.frame(v1 = c('A', 'C', 'E', 'G'),
      v2 = c('B', 'D', 'F', 'H'),
      value = c(1.12, 12.52, 3.19, 12.52))

 > df2
   v1 v2 value
 1  A  B  1.12
 2  C  D 12.52
 3  E  F  3.19
 4  G  H 12.52

>Solution :

The idea is to consider v1 and v2 interchangeable.

df1 <- data.frame(v1 = c('A', 'B', 'C', 'D', 'E', 'F', 'G', 'H'),
                  v2 = c('B', 'A', 'D', 'C', 'F', 'E', 'H', 'G'),
                  value = c(1.12, 1.12, 12.52, 12.52, 3.19, 3.19, 12.52, 12.52))

### with tidyverse:

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(purrr)

df2 <- df1 %>%
        mutate(combination = pmap_chr(list(v1, v2), ~ paste(sort(c(..1, ..2)), collapse = ","))) %>%
        filter(!duplicated(combination)) %>%
        select(-combination)

df2
#>   v1 v2 value
#> 1  A  B  1.12
#> 2  C  D 12.52
#> 3  E  F  3.19
#> 4  G  H 12.52

### Base R:
df2 <- df1[!duplicated(t(apply(df1[, c("v1", "v2")], 1, sort))), ]

df2
#>   v1 v2 value
#> 1  A  B  1.12
#> 3  C  D 12.52
#> 5  E  F  3.19
#> 7  G  H 12.52

Created on 2023-12-24 with reprex v2.0.2

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading