Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

R: Merge "Between" using Base R

I am working with the R programming language. Suppose I have the following tables (note: the dates appear as factors in my problem):

table_1 = data.frame(id = c("123", "123", "125", "125"), 
date_1 = c("2010-01-31","2010-01-31", "2015-01-31", "2018-01-31" ))

table_1$id = as.factor(table_1$id)
table_1$date_1 = as.factor(table_1$date_1)

table_2 = data.frame(id = c("123", "121", "125", "126"), 
date_2 = c("2009-01-31","2010-01-31", "2010-01-31", "2010-01-31" ),
date_3 = c("2011-01-31","2010-01-31", "2020-01-31", "2020-01-31" ))


table_2$id = as.factor(table_2$id)
table_2$date_2 = as.factor(table_2$date_2)
table_2$date_3 = as.factor(table_2$date_3)

I would like to perform (some type of) "join" on these two tables (for now it doesn’t matter, e.g. right join, inner join, etc.) using the following conditions:

1) if table_1$id = table_2$id

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

AND

2) if table_1$date BETWEEN(table_2$date_2,table_2$date_3)

I found a previous question on Stackoverflow which demonstrates how to do this using the "SQLDF" library: r merge by id and date between two dates

library(sqldf)

final = sqldf("select a.*, b.*
           
       from table_1 a left join table_2 b
       on a.id = b.id and 
          a.date_1 between 
              b.date_2 and
              b.date_3")
head(final)
#for some reason, this produces duplicate rows, I don't know why



id     date_1  id     date_2     date_3
1 123 2010-01-31 123 2009-01-31 2011-01-31
2 123 2010-01-31 123 2009-01-31 2011-01-31
3 125 2015-01-31 125 2010-01-31 2020-01-31
4 125 2018-01-31 125 2010-01-31 2020-01-31

#optional: remove duplicates
final_no_dup <- final[!duplicated(final$id),]

My Question: Is there a way to perform the above "join" using Base R? If this is not possible in Base R, can this be done in "dplyr"?

Thanks!

>Solution :

You may try this way in dplyr

table_1 %>%
  left_join(table_2, by = "id") %>%
  mutate(across(2:4, ~as.Date(.x))) %>%
  filter(date_1 <= max(date_3, date_2), date_1 >= min(date_2, date_3)) %>%
  distinct()

   id     date_1     date_2     date_3
1 123 2010-01-31 2009-01-31 2011-01-31
2 125 2015-01-31 2010-01-31 2020-01-31
3 125 2018-01-31 2010-01-31 2020-01-31

Base R

table_3 <- merge(x = table_1, y = table_2, by = "id", all.x = TRUE)
table_3 <- table_3[table_3$date_1 <= max(table_3$date_2, table_3$date_3) && table_3$date_1 >= min(table_3$date_2,table_3$date_3)]
table_3[!duplicated(table_3),]
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading