Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

R: How to subset dataframe over columns from 2 different dataframe?

How to subset dataframe over columns from 2 different dataframe using R code?

Here is the dummy code:

library(dplyr)
data <- data.frame(b = rep(LETTERS[1:4],2), c = c("B", "A", "A", "E", "G", "H", "K", "L"))

#   b c
# 1 A B
# 2 B A
# 3 C A
# 4 D E
# 5 A G
# 6 B H
# 7 C K
# 8 D L


data2 <- data.frame(d = c("A", "B", ""), e = c("E", "", "C"))
#    d e
#1   A E
#2   B  
#3     C


subset <- subset(data, data$b %in%  c(data2$d, data2$e))

#   b c
# 1 A B
# 2 B A
# 3 C A
# 5 A G
# 6 B H
# 7 C K

As you can see, i can use subset() function to overlap "data" to "data2". But what if i have large number of columns in "data2"? is there a way to simplify this code? If possible tidyverse approach is preferred.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

I tried to use below code, but its not working.

subset_try <- subset(data, data$b %in%  data2[,c(1:2)])
#[1] b c
#<0 rows> (or 0-length row.names)

Thank you.

>Solution :

If there are lots of columns, unlist into a vector and subset

subset(data, b %in% unlist(data2))

If we want only a subset of columns, then select the subset of columns and unlist – Note that data2[, c(1, 2)] is still a data.frame with two columns and not a vector, thus when we do

> data$b %in% data2
[1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE

It has to do with the table argument with %in%. According to ?"%in%"

x %in% table

table – vector or NULL: the values to be matched against. Long vectors are not supported.

Therefore, we may want to convert to vector

> data$b %in% unlist(data2)
[1]  TRUE  TRUE  TRUE FALSE  TRUE  TRUE  TRUE FALSE
> subset(data, b %in% unlist(data2[1:2]))

For the tidyverse, it is just replacing the subset with filter

library(dplyr)
filter(data, b %in% unlist(data2[1:2]))
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading