Extract data based on another list

January 25, 2022

I am trying to extract rows of a dataset based on a list of time points nested within individuals. I have repeated time points (therefore exactly the same variable values) but I still want to keep the duplicated rows. How to achieve that in base R?

Here is the original dataset:

xx <- data.frame(id=rep(1:3, each=3), time=1:3, y=rep(1:3, each=3))

Here is the list of matrices where the third one is a vector

lst <- list(`1` = c(1, 1, 2), `2` = c(1, 3, 3), `3` = c(2, 2, 3))

Desirable outcome:

id time y
 1    1 1
 1    1 1  #this is the duplicated row
 1    2 1
 2    1 2
 2    3 2
 2    3 2 #this is the duplicated row
 3    2 3
 3    2 3 #this is the duplicated row
 3    3 3

The code do.call(rbind, Map(function(p, q) subset(xx, id == q & time %in% p), lst, names(lst))) did not work for me because subset removes duplicated rows

>Solution :

The issue is that %in% doesn’t iterate over the non-unique values repeatedly. To do so, we need to also iterate (lapply) over p internally. I’ll wrap your inner subset in another do.call(rbind, lapply(p, ...)) to get what you expect:

do.call(rbind, Map(function(p, q) {
  do.call(rbind, lapply(p, function(p0) subset(xx, id == q & time %in% p0))) 
  }, lst, names(lst)))
#      id time y
# 1.1   1    1 1
# 1.2   1    1 1
# 1.21  1    2 1
# 2.4   2    1 2
# 2.6   2    3 2
# 2.61  2    3 2
# 3.8   3    2 3
# 3.81  3    2 3
# 3.9   3    3 3

(Row names are a distraction here …)

An alternative would be to convert your lst into a frame of id and time, and then left-join on it:

frm <- do.call(rbind, Map(function(x, nm) data.frame(id = nm, time = x), lst, names(lst)))
frm
#     id time
# 1.1  1    1
# 1.2  1    1
# 1.3  1    2
# 2.1  2    1
# 2.2  2    3
# 2.3  2    3
# 3.1  3    2
# 3.2  3    2
# 3.3  3    3

merge(frm, xx, by = c("id", "time"), all.x = TRUE)
#   id time y
# 1  1    1 1
# 2  1    1 1
# 3  1    2 1
# 4  2    1 2
# 5  2    3 2
# 6  2    3 2
# 7  3    2 3
# 8  3    2 3
# 9  3    3 3

Two good resources for learning about merges/joins: