I am facing a (naive) problem with regular expression.
I need to find any substrings composed of a fixed number (n) of different characters.
So, for "aaabcddd", if n=3 the substrings that I expect to find are: "abc" and "bcd".
My idea is to use n-1 capture groups and ‘[^’ to exclude characters already matched. Thus, I wrote the following Perl regex (in Julia):
r"(([[:alpha:]])[^\2])[^\1]"
But is not working.
Do you have any tips?
>Solution :
You can not use a backreference to a capture group using a negated character class [^\1]
What you can do is use a negative lookahead to assert what is directly to the right of the current position is not what you have already captured in a previous group.
If that is the case, capture a single alpha in a new group.
The matches abc and bcd are in capture group 1
(?=(([[:alpha:]])(?!\2)([[:alpha:]])(?!\3|\2)[[:alpha:]]))
(?=Positive lookahead(Capture group 1([[:alpha:]])Capture the first char in group 2(?!\1)([[:alpha:]])If not looking at what is captured by group 2 to the right, capture the second char in group 3(?!\2|\1)If not looking to the right at what is captured by group 2 or 3[[:alpha:]]Mach the 3rd char
)Close group 1
)Close the lookahead
Or a bit shorter using a case insensitive match:
(?=(([a-z])(?!\2)([a-z])(?!\3|\2)[a-z]))