Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Replace words that partially match over a list

I have a list of names belonging the dataset mammalsleep, and I want to replace those names that have additional characters on the name.

For example:

pr_replace <- paste(c('log(brain)','I(body^2)'), collapse="|")
extract_replace <- paste(c('brain','body'), collapse="|")

We replace extract_replace for pr_replace.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

I have tried two ways of doing this:

  lapply(per, function(dat)
    sapply(dat, function(x)
      str_replace(x, extract_replace, pr_replace)) %>% data.frame())

Would instead replace the values when found with

                X9
1             exposure               danger log(brain)|I(body^2)
2               danger log(brain)|I(body^2) log(brain)|I(body^2)
3 log(brain)|I(body^2) log(brain)|I(body^2)             nondream
4 log(brain)|I(body^2)             nondream                dream
5             nondream                dream                sleep
6                dream                sleep            gestation
7                sleep            gestation            predation
8            gestation            predation             exposure
9            predation             exposure               danger

I have also tried:

pr_r<-c('log(brain)','I(body^2)')
  mapply(function(x, y)
    lapply(x, function(dat)
      sapply(dat, function(z)
        str_replace(z, extract_replace, y)) %>% data.frame()), per, pr_r, SIMPLIFY = FALSE)
  

However, this does not produce the results I am after.

Expected output:
when values brain are found we should get log(brain), when body is found we should get I(body^2) in replacement.

Expected output:

[[1]]
         X1        X2        X3        X4        X5        X6       X7       X8       X9
1     log(brain)  nondream     dream     sleep gestation predation exposure   danger     I(body^2)
2  nondream     dream     sleep gestation predation  exposure   danger     I(body^2)    log(brain)
3     dream     sleep gestation predation  exposure    danger     body    log(brain) nondream
4     sleep gestation predation  exposure    danger      I(body^2)    log(brain) nondream    dream
5 gestation predation  exposure    danger      I(body^2)     brain nondream    dream    sleep

[[2]]
         X1        X2        X3        X4        X5        X6       X7       X8        X9
1     log(brain)  nondream     dream     sleep gestation predation exposure   danger      I(body^2)
2  nondream     dream     sleep gestation predation  exposure   danger     I(body^2)     log(brain)
3     dream     sleep gestation predation  exposure    danger     body    log(brain)  nondream
4     sleep gestation predation  exposure    danger      I(body^2)    brain nondream     dream
5 gestation predation  exposure    danger      I(body^2)     log(brain) nondream    dream     sleep
6 predation  exposure    danger      I(body^2)     brain  nondream    dream    sleep gestation

reproducible code:

per <- list(structure(list(X1 = c("brain", "nondream", "dream", "sleep", 
"gestation"), X2 = c("nondream", "dream", "sleep", "gestation", 
"predation"), X3 = c("dream", "sleep", "gestation", "predation", 
"exposure"), X4 = c("sleep", "gestation", "predation", "exposure", 
"danger"), X5 = c("gestation", "predation", "exposure", "danger", 
"body"), X6 = c("predation", "exposure", "danger", "body", "brain"
), X7 = c("exposure", "danger", "body", "brain", "nondream"), 
    X8 = c("danger", "body", "brain", "nondream", "dream"), X9 = c("body", 
    "brain", "nondream", "dream", "sleep")), row.names = c(NA, 
5L), class = "data.frame"), structure(list(X1 = c("brain", "nondream", 
"dream", "sleep", "gestation", "predation"), X2 = c("nondream", 
"dream", "sleep", "gestation", "predation", "exposure"), X3 = c("dream", 
"sleep", "gestation", "predation", "exposure", "danger"), X4 = c("sleep", 
"gestation", "predation", "exposure", "danger", "body"), X5 = c("gestation", 
"predation", "exposure", "danger", "body", "brain"), X6 = c("predation", 
"exposure", "danger", "body", "brain", "nondream"), X7 = c("exposure", 
"danger", "body", "brain", "nondream", "dream"), X8 = c("danger", 
"body", "brain", "nondream", "dream", "sleep"), X9 = c("body", 
"brain", "nondream", "dream", "sleep", "gestation")), row.names = c(NA, 
6L), class = "data.frame"))

>Solution :

Instead of pasteing the elements in the replacement (which literally process it compared to the evaluation in pattern for |), we can create two vectors or a single named vector where the names should match the substring in the original data to replace the values from the vector

pr_replace <- c('log(brain)','I(body^2)')
extract_replace <- c('brain','body')
named_vec <- setNames(pr_replace, extract_replace)

Now, we loop over the list with map, loop across the columns of the datasets and apply str_replace with a named vector

library(purrr)
library(stringr)
library(dplyr)
per <- map(per, ~ .x %>%
   mutate(across(everything(), ~ str_replace_all(.x, 
        named_vec))))

-output

per
[[1]]
          X1        X2        X3        X4        X5         X6         X7         X8         X9
1 log(brain)  nondream     dream     sleep gestation  predation   exposure     danger  I(body^2)
2   nondream     dream     sleep gestation predation   exposure     danger  I(body^2) log(brain)
3      dream     sleep gestation predation  exposure     danger  I(body^2) log(brain)   nondream
4      sleep gestation predation  exposure    danger  I(body^2) log(brain)   nondream      dream
5  gestation predation  exposure    danger I(body^2) log(brain)   nondream      dream      sleep

[[2]]
          X1        X2        X3        X4         X5         X6         X7         X8         X9
1 log(brain)  nondream     dream     sleep  gestation  predation   exposure     danger  I(body^2)
2   nondream     dream     sleep gestation  predation   exposure     danger  I(body^2) log(brain)
3      dream     sleep gestation predation   exposure     danger  I(body^2) log(brain)   nondream
4      sleep gestation predation  exposure     danger  I(body^2) log(brain)   nondream      dream
5  gestation predation  exposure    danger  I(body^2) log(brain)   nondream      dream      sleep
6  predation  exposure    danger I(body^2) log(brain)   nondream      dream      sleep  gestation
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading