I have the following vector:
cpf <- "12345678910"
The following function works:
strcapture("(\\b.{3})(.{3})(.{3})(.{2}\\b)", cpf,
proto = list(n1 = character(), n2 = character(), n3 = character(), n4 = character()))
n1 n2 n3 n4
1 123 456 789 10
But, when add the backreference \\ does not work:
strcapture("(\\b.{3})\\1\\1(.{2}\\b)", cpf,
proto = list(n1 = character(), n2 = character(), n3 = character(), n4 = character()))
>Solution :
There might be a misunderstanding of what a \1 backreference means. When you include \1 in your regex pattern, it is referring to whatever was captured in the first capture group. So in the following input:
12345678910
The first capture group would be 123. As this is never repeated anywhere else subsequently in the input, \1 will never match anything. Consider the following example, which should work with the latest pattern:
cpf <- "12312312310"
strcapture("\\b(.{3})(\\1)(\\1)(.{2})\\b", cpf,
proto = list(n1 = character(), n2 = character(), n3 = character(), n4 = character()))
In this case, the first capture group 123 repeats twice.