Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

How to "grepl" conditional on a range of dates in R

Let’say I have two dataframes df1 and df2:


df1 = structure(list(surname = c("Duisenberg", "Trichet", "Draghi"), 
    `start term` = structure(c(896659200, 1067644800, 1320105600
    ), class = c("POSIXct", "POSIXt"), tzone = "UTC"), `end term` = structure(c(1067558400, 
    1320019200, 1572480000), class = c("POSIXct", "POSIXt"), tzone = "UTC")), row.names = c(1L, 
9L, 15L), class = "data.frame") %>% data.frame(stringsAsFactors = F)

   surname start.term   end.term
1  Duisenberg 1998-06-01 2003-10-31
9     Trichet 2003-11-01 2011-10-31
15     Draghi 2011-11-01 2019-10-31


df2= data.frame(Date = c("2010-01-01","1997-01-01","2020-01-01","2004-01-01","2012-01-01","1999-01-01","2000-01-01","2020-01-01","2022-01-01","1996-01-01"), speaker = c("Mario Draghi","W.L. Duisenberg","Ciao","Jean-Claude Trichet","M. Draghi","W.L. Duisenberg","Jean-Claude Trichet","Bye","Ciao","Mario Draghi"), stringsAsFactors = F)

         Date             speaker
1  2010-01-01        Mario Draghi
2  1997-01-01     W.L. Duisenberg
3  2020-01-01                Ciao
4  2004-01-01 Jean-Claude Trichet
5  2012-01-01           M. Draghi
6  1999-01-01     W.L. Duisenberg
7  2000-01-01 Jean-Claude Trichet
8  2020-01-01                 Bye
9  2022-01-01                Ciao
10 1996-01-01        Mario Draghi

I can easily find when the names in df1 appears in df2 as such:

which(grepl(paste0(df1$surname, collapse = "|"), df2$speaker, ignore.case = TRUE))

[1]  1  2  4  5  6  7 10

Instead, it is trickier to say: the names in df1 appears in df2 only if the date in df2 falls outside the boundaries of df1 (start.term and end.term).

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

The outcome should be:

[1] 1 2 10

how can I do it? Can someone help me with this?

Thanks!

>Solution :

I think essentially you want to do a join operation here on the names which match. So the first step is to find out what those are:

library(dplyr)

surnames_regex <- paste0(df1$surname, collapse = "|")

df2$matching_name <- strsplit(df2$speaker, split = "\\s") |>
    lapply(
        \(name) {
            matching_name <- grep(surnames_regex, name, v = T)
            matching_name <- ifelse(
                length(matching_name) > 0,
                matching_name[1],
                NA_character_
            )
            matching_name
        }
    ) |>
    unlist()


df2
#          Date             speaker matching_name
# 1  2010-01-01        Mario Draghi        Draghi
# 2  1997-01-01     W.L. Duisenberg    Duisenberg
# 3  2020-01-01                Ciao          <NA>
# 4  2004-01-01 Jean-Claude Trichet       Trichet
# 5  2012-01-01           M. Draghi        Draghi
# 6  1999-01-01     W.L. Duisenberg    Duisenberg
# 7  2000-01-01 Jean-Claude Trichet       Trichet
# 8  2020-01-01                 Bye          <NA>
# 9  2022-01-01                Ciao          <NA>
# 10 1996-01-01        Mario Draghi        Draghi

Then it’s simply a case of joining on those names and filtering according to the conditions you defined:

df2 |>
    inner_join(
        df1,
        by = c("matching_name" = "surname")
    ) |>
    filter(
        Date < start.term |
            Date > end.term
    )
#         Date             speaker matching_name start.term   end.term
# 1 2010-01-01        Mario Draghi        Draghi 2011-11-01 2019-10-31
# 2 1997-01-01     W.L. Duisenberg    Duisenberg 1998-06-01 2003-10-31
# 3 2000-01-01 Jean-Claude Trichet       Trichet 2003-11-01 2011-10-31
# 4 1996-01-01        Mario Draghi        Draghi 2011-11-01 2019-10-31
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading