Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

How to match elements between 2 data frames subject to multiple conditions using base R or dplyr?

I have two data frames, myDF and index, and I efficiently match between the two as shown below where I create a new column in data frame myDF called alloc_1 which is based on row matches it finds in data frame Index (where row number in data frame index matches eleCnt in data frame myDF):

enter image description here

Preferably using my existing mutate() as a starting point where I especially like index$index[eleCnt] for identifying the index row number, how can I further restrict matching from data frame index to only those rows where cumGrp = 1? Resulting in the below pretend output:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

  eleCnt concat_1 alloc_1  explanation
1      1      1.0    10.0
2      2      2.0      NA  since index row 2 cumGrp <> 1, don't match it
3      1      1.0    10.0 
4      2      2.1      NA  since index row 2 cumGrp <> 1, don't match it
5      3      2.2      NA

Below is code for myDF, index, and the dplyr mutate():

myDF <- data.frame(
  eleCnt = c(1,2,1,2,3),
  concat_1 = c(1,2,1,2.1,2.2)
  )

index <- data.frame(
  index = c(10,2.1),
  cumGrp = c(1,2)
  )

library(dplyr)
myDF %>% mutate(alloc_1 = index$index[eleCnt])

>Solution :

myDF %>%
  mutate(alloc_1 = index$index[index$cumGrp == 1][eleCnt])
#   eleCnt concat_1 alloc_1
# 1      1      1.0      10
# 2      2      2.0      NA
# 3      1      1.0      10
# 4      2      2.1      NA
# 5      3      2.2      NA

This is close to a "join" operation, with just a little nudge (adding eleCnt to index). A join operation can be faster, perhaps easier to understand/debug (once you really "grok" join/merge, see How to join (merge) data frames (inner, outer, left, right), What's the difference between INNER JOIN, LEFT JOIN, RIGHT JOIN and FULL JOIN?), and allows for other methods/join efficiencies as well.

index$eleCnt <- seq_len(nrow(index)) # or whatever they should be
### simple join, not filtered on cumGrp
myDF %>%
  left_join(index, by = "eleCnt")
#   eleCnt concat_1 index cumGrp
# 1      1      1.0  10.0      1
# 2      2      2.0   2.1      2
# 3      1      1.0  10.0      1
# 4      2      2.1   2.1      2
# 5      3      2.2    NA     NA
### improved
index %>%
  filter(cumGrp == 1) %>%
  left_join(myDF, ., by = "eleCnt")
#   eleCnt concat_1 index cumGrp
# 1      1      1.0    10      1
# 2      2      2.0    NA     NA
# 3      1      1.0    10      1
# 4      2      2.1    NA     NA
# 5      3      2.2    NA     NA

In the join/merge case, note that all other columns are brought in as well. This can be good (so that you don’t have to repeat index$index[eleCnt] for each column you want to import), and can easily be controlled by filtering before or de-selecting after the join.

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading