Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Extract data based on another list

I am trying to extract rows of a dataset based on a list of time points nested within individuals. I have repeated time points (therefore exactly the same variable values) but I still want to keep the duplicated rows. How to achieve that in base R?

Here is the original dataset:

xx <- data.frame(id=rep(1:3, each=3), time=1:3, y=rep(1:3, each=3))

Here is the list of matrices where the third one is a vector

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

lst <- list(`1` = c(1, 1, 2), `2` = c(1, 3, 3), `3` = c(2, 2, 3))

Desirable outcome:

id time y
 1    1 1
 1    1 1  #this is the duplicated row
 1    2 1
 2    1 2
 2    3 2
 2    3 2 #this is the duplicated row
 3    2 3
 3    2 3 #this is the duplicated row
 3    3 3

The code do.call(rbind, Map(function(p, q) subset(xx, id == q & time %in% p), lst, names(lst))) did not work for me because subset removes duplicated rows

>Solution :

The issue is that %in% doesn’t iterate over the non-unique values repeatedly. To do so, we need to also iterate (lapply) over p internally. I’ll wrap your inner subset in another do.call(rbind, lapply(p, ...)) to get what you expect:

do.call(rbind, Map(function(p, q) {
  do.call(rbind, lapply(p, function(p0) subset(xx, id == q & time %in% p0))) 
  }, lst, names(lst)))
#      id time y
# 1.1   1    1 1
# 1.2   1    1 1
# 1.21  1    2 1
# 2.4   2    1 2
# 2.6   2    3 2
# 2.61  2    3 2
# 3.8   3    2 3
# 3.81  3    2 3
# 3.9   3    3 3

(Row names are a distraction here …)

An alternative would be to convert your lst into a frame of id and time, and then left-join on it:

frm <- do.call(rbind, Map(function(x, nm) data.frame(id = nm, time = x), lst, names(lst)))
frm
#     id time
# 1.1  1    1
# 1.2  1    1
# 1.3  1    2
# 2.1  2    1
# 2.2  2    3
# 2.3  2    3
# 3.1  3    2
# 3.2  3    2
# 3.3  3    3

merge(frm, xx, by = c("id", "time"), all.x = TRUE)
#   id time y
# 1  1    1 1
# 2  1    1 1
# 3  1    2 1
# 4  2    1 2
# 5  2    3 2
# 6  2    3 2
# 7  3    2 3
# 8  3    2 3
# 9  3    3 3

Two good resources for learning about merges/joins:

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading