Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Find repeated elements in a list and remove those objects

I’ve got a long list, each object of which is itself a list containing headers and data. Some of the objects are repeated. I’d like to find the repeated objects and remove them.

Ideally this would find objects that are identical (name and contents). If both the name and contents are identical then the repeat is removed. If the name is the same, but the contents are different, then the object is renamed.

Alternatively I’d settle for finding names that are repeated and removing the objects without checking their content.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

Here’s a simplified example

my.list <- list(sample1 = list(header = c("a","b","c","k"),
                               data = c("a","b","c","k")),
                sample2 = list(header = c("d", "k", "x"),
                               data = c("d", "k", "x")),
                sample3 = list(header = c("z", "r", "v"),
                               data = c("z", "r", "v")),
                sample1 = list(header = c("a","b","c","k"),
                               data = c("a","b","c","k")),
                sample2 = list(header = c("h", "j", "l"),
                               data = c("h", "j", "l")))

table(names(my.list))

sample1 sample2 sample3 
      2       2       1 

In the above example, the second sample1 would be removed, but the second sample2 would be renamed, e.g. sample2_2.

I’ve read around, but can’t find an example which uses objects that are themselves lists. The other solutions don’t seem to cover it, e.g. Remove duplicate in a large list while keeping the named number in R

>Solution :

This is relatively simple to do in two steps, but I’m not sure it can be done in one. The first step is removing exact duplicates (with duplicated) and the second one is name repair (with make.names):

my.list <- list(sample1 = list(header = c("a","b","c","k"),
                               data = c("a","b","c","k")),
                sample2 = list(header = c("d", "k", "x"),
                               data = c("d", "k", "x")),
                sample3 = list(header = c("z", "r", "v"),
                               data = c("z", "r", "v")),
                sample1 = list(header = c("a","b","c","k"),
                               data = c("a","b","c","k")),
                sample2 = list(header = c("h", "j", "l"),
                               data = c("h", "j", "l")))

my.list.dedup <- my.list[!duplicated(my.list)]
names(my.list.dedup) <- make.names(names(my.list.dedup), unique = TRUE)

which returns

list(
  sample1 = list(
    header = c("a", "b", "c", "k"),
    data = c("a", "b", "c", "k")
  ),
  sample2 = list(
    header = c("d", "k", "x"),
    data = c("d", "k", "x")
  ),
  sample3 = list(
    header = c("z", "r", "v"),
    data = c("z", "r", "v")
  ),
  sample2.1 = list(
    header = c("h", "j", "l"),
    data = c("h", "j", "l")
  )
)
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading