Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

backreference into strcapture regex

I have the following vector:

cpf <- "12345678910"

The following function works:

strcapture("(\\b.{3})(.{3})(.{3})(.{2}\\b)", cpf, 
       proto = list(n1 = character(), n2 = character(), n3 = character(), n4 = character()))

   n1  n2  n3 n4
1 123 456 789 10

But, when add the backreference \\ does not work:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

strcapture("(\\b.{3})\\1\\1(.{2}\\b)", cpf, 
       proto = list(n1 = character(), n2 = character(), n3 = character(), n4 = character()))

>Solution :

There might be a misunderstanding of what a \1 backreference means. When you include \1 in your regex pattern, it is referring to whatever was captured in the first capture group. So in the following input:

12345678910

The first capture group would be 123. As this is never repeated anywhere else subsequently in the input, \1 will never match anything. Consider the following example, which should work with the latest pattern:

cpf <- "12312312310"
strcapture("\\b(.{3})(\\1)(\\1)(.{2})\\b", cpf, 
    proto = list(n1 = character(), n2 = character(), n3 = character(), n4 = character()))

In this case, the first capture group 123 repeats twice.

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading