Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Search multiple keywords over a column and create columns for each

I have the following data.

stringstosearch <- c("to", "and", "at", "from", "is", "of")

set.seed(199)
id <- c(rnorm(5))
x  <- c("Contrary to popular belief, Lorem Ipsum is not simply random text.",
       "A Latin professor at Hampden-Sydney College in Virginia",
       "It has roots in a piece of classical Latin ", 
       "literature from 45 BC, making it over 2000 years old.", 
       "The standard chunk of Lorem Ipsum used since")
datatxt <- data.frame(id, x)

datatxt$result <- str_detect(datatxt$x, paste0(stringstosearch, collapse = '|'))

I want to search the keywords listed in stringtosearch and create columns for each with results.

I can do this,

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

library(stringr)

datatxt$result <- str_detect(datatxt$x, paste0(stringstosearch, collapse = '|'))

datatxt$result

> datatxt$result
[1] TRUE TRUE TRUE TRUE TRUE

However I want to create results for each of the strings in the stringstosearch. Any idea how to do that?

The result should look like this or similar:

          id                                                                  x    to   and    at  from    is    of
1 -1.9091427 Contrary to popular belief, Lorem Ipsum is not simply random text.  TRUE FALSE FALSE FALSE  TRUE  TRUE
2  0.5551667            A Latin professor at Hampden-Sydney College in Virginia FALSE FALSE  TRUE FALSE FALSE FALSE
3 -2.2163365                        It has roots in a piece of classical Latin  FALSE FALSE FALSE FALSE FALSE FALSE
4  0.4941455              literature from 45 BC, making it over 2000 years old. FALSE FALSE FALSE  TRUE FALSE FALSE
5 -0.5805710                       The standard chunk of Lorem Ipsum used since FALSE FALSE FALSE FALSE FALSE FALSE

Any idea how to achieve this?

>Solution :

Here is a base R one-liner. Use sprintf() to add the \\b word boundary anchors to each pattern. This means that, for example, "and" will not match "random". Then iterate over these patterns with lapply(), using grepl() to match each pattern to datatxt$x. This returns a list of logical vectors, which we can assign back to the data frame.

datatxt[stringstosearch] <- lapply(
    sprintf("\\b%s\\b", stringstosearch), \(x) grepl(x, datatxt$x)
)

Now datatxt is as desired:

          id                                                                  x    to   and    at  from    is    of
1 -1.9091427 Contrary to popular belief, Lorem Ipsum is not simply random text.  TRUE FALSE FALSE FALSE  TRUE FALSE
2  0.5551667            A Latin professor at Hampden-Sydney College in Virginia FALSE FALSE  TRUE FALSE FALSE FALSE
3 -2.2163365                        It has roots in a piece of classical Latin  FALSE FALSE FALSE FALSE FALSE  TRUE
4  0.4941455              literature from 45 BC, making it over 2000 years old. FALSE FALSE FALSE  TRUE FALSE FALSE
5 -0.5805710                       The standard chunk of Lorem Ipsum used since FALSE FALSE FALSE FALSE FALSE  TRUE

tidyverse approach

As you tagged tidyverse, here an alternative method. This returns the same list as the base R approach using tidyverse functions, except it’s named. Then we can use the splice operator to pass this to dplyr::mutate() as new columns:

datatxt |>
    dplyr::mutate(
        !!!purrr::map(
            purrr::set_names(
                stringr::str_glue("\\b{stringstosearch}\\b"),
                stringstosearch
            ),
            \(str) stringr::str_detect(x, str)
        )
    )
# ^^ same output

I think the base R approach is much cleaner.

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading