Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Replace substring based on the position in the string via regex

Let’s assume I have a certain pattern in my string which occurs a known number of times (n) and we do not want to make any assumptions about the rest of the string (in particular the strings which are between those patterns).

Furthermore, I have a vector of length n (sf, say) and I want to amend each occurrence of the pattern with the corresponding element. Thus, for each match I would like to know how often the match has hit already?

I could think of the following solution:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

library(stringr)
sf <- letters[4:1]
ss <- "fdskjhf xx sd ss xx wwwe xx ss  xx sdsd"
#              ^^ 1st   ^^ 2nd  ^^ 3rd ^^ 4th
# add:         _sf[1]   _sf[2]  _sf[3] _sf[4]
# that is:     xx_d     xx_c    xx_b   xx_a


## add _sf[i] to the ith occurence of "xx" in ss
goal <- "fdskjhf xx_d sd ss xx_c wwwe xx_b ss  xx_a sdsd"

my_replacer_factory <- function(sf, start = 0) {
  cnt <- start
  function(el) {
    cnt <<- cnt + 1
    paste0(el, "_", rev(sf)[cnt])
  }
}

my_replacer <- my_replacer_factory(sf)
(res <- str_replace_all(ss, fixed("xx"), my_replacer))
# [1] "fdskjhf xx_d sd ss xx_c wwwe xx_b ss  xx_a sdsd"

all.equal(res, goal)
# [1] TRUE

This works apparently, but it feels error prone b/c I rely on the fact that str_replace_all starts from the right to replace. What if in a future implementation this behaviour changes or gets parallelized?

Any idea of how to achieve this differntly? Ideally with stringr functions?


Similar idea:

my_replacer_factory <- function(sf) {
  suffixes <- rev(sf)
  function(el) {
    on.exit(suffixes <<- suffixes[-1L], add = TRUE)
    paste0(el, "_", suffixes[1L])
  }
}

>Solution :

A way would be to use regmatches<-.

sf <- letters[4:1]
ss <- "fdskjhf xx sd ss xx wwwe xx ss  xx sdsd"

regmatches(ss, gregexpr("xx", ss)) <- list(paste0("xx_", sf))
ss
#[1] "fdskjhf xx_d sd ss xx_c wwwe xx_b ss  xx_a sdsd"

#Alternative with look behind
regmatches(ss, gregexpr("(?<=xx)", ss, perl=TRUE)) <- list(paste0("_", sf))
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading