Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Dealing with long regex patterns in R

I have to apply a long regex pattern to a long string. The regex pattern is something such:

seed(1234)    
myFun <- function(n = 5000) {
      a <- do.call(paste0, replicate(5, sample(LETTERS, n, TRUE), FALSE))
      paste0(a, sprintf("%04d", sample(9999, n, TRUE)), sample(LETTERS, n, TRUE))
    }
   
long_regex <- paste0(myFun(1000), collapse = "|")
long_regex <- paste0("(", long_regex, ")")

However, gsub can´t deal with such long patterns:

text <- "HPPIZ9166O BHVOF0473O LCVDO3833Z"
gsub(long_regex, "marker \\1;", text)
Error in gsub(long_regex, "marker \\1;", text) : 
  assertion 'tree->num_tags == num_tags' failed in executing regexp: file 'tre-compile.c', 
  line 634 

How do I overcome this issue? Thank you.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

>Solution :

If your regexes are okay as perl regexes, the perl-compatible regex engine seems to cope:

> gsub(long_regex, "marker \\1;", text)
Error in gsub(long_regex, "marker \\1;", text) : 
  assertion 'tree->num_tags == num_tags' failed in executing regexp: file 'tre-compile.c', line 634

but…

> gsub(long_regex, "marker \\1;", text, perl=TRUE)
[1] "HPPIZ9166O BHVOF0473O LCVDO3833Z"

If I pick out one of the strings from the regex you can see the gsub works in this case:

> substr(long_regex,10000,10100)
[1] "|PZIFO9919X|VBICZ3063E|HZTGZ8881V|PUURO8525W|QLYMN6531U|KTUQZ7171V|GULUD6556Z|UMHSA7400F|DAYHH0017F|Q"
> text = "HZTGZ8881V nope "
> gsub(long_regex, "marker \\1;", text, perl=TRUE)
[1] "marker HZTGZ8881V; nope "
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading