Find repeated elements in a list and remove those objects

February 10, 2023

I’ve got a long list, each object of which is itself a list containing headers and data. Some of the objects are repeated. I’d like to find the repeated objects and remove them.

Ideally this would find objects that are identical (name and contents). If both the name and contents are identical then the repeat is removed. If the name is the same, but the contents are different, then the object is renamed.

Alternatively I’d settle for finding names that are repeated and removing the objects without checking their content.

Here’s a simplified example

my.list <- list(sample1 = list(header = c("a","b","c","k"),
                               data = c("a","b","c","k")),
                sample2 = list(header = c("d", "k", "x"),
                               data = c("d", "k", "x")),
                sample3 = list(header = c("z", "r", "v"),
                               data = c("z", "r", "v")),
                sample1 = list(header = c("a","b","c","k"),
                               data = c("a","b","c","k")),
                sample2 = list(header = c("h", "j", "l"),
                               data = c("h", "j", "l")))

table(names(my.list))

sample1 sample2 sample3 
      2       2       1

In the above example, the second sample1 would be removed, but the second sample2 would be renamed, e.g. sample2_2.

I’ve read around, but can’t find an example which uses objects that are themselves lists. The other solutions don’t seem to cover it, e.g. Remove duplicate in a large list while keeping the named number in R

>Solution :

This is relatively simple to do in two steps, but I’m not sure it can be done in one. The first step is removing exact duplicates (with duplicated) and the second one is name repair (with make.names):

my.list <- list(sample1 = list(header = c("a","b","c","k"),
                               data = c("a","b","c","k")),
                sample2 = list(header = c("d", "k", "x"),
                               data = c("d", "k", "x")),
                sample3 = list(header = c("z", "r", "v"),
                               data = c("z", "r", "v")),
                sample1 = list(header = c("a","b","c","k"),
                               data = c("a","b","c","k")),
                sample2 = list(header = c("h", "j", "l"),
                               data = c("h", "j", "l")))

my.list.dedup <- my.list[!duplicated(my.list)]
names(my.list.dedup) <- make.names(names(my.list.dedup), unique = TRUE)

which returns

list(
  sample1 = list(
    header = c("a", "b", "c", "k"),
    data = c("a", "b", "c", "k")
  ),
  sample2 = list(
    header = c("d", "k", "x"),
    data = c("d", "k", "x")
  ),
  sample3 = list(
    header = c("z", "r", "v"),
    data = c("z", "r", "v")
  ),
  sample2.1 = list(
    header = c("h", "j", "l"),
    data = c("h", "j", "l")
  )
)