IWhile I want to match string begins with ‘c’ or ends with ‘c’,
I write a regular expression like this:
import re
reg = re.compile(r'^(c)|\1$')
reg.search('ca') # match
reg.search('ac') # not match
Actually, the ‘c’ part in the regexp is a complex substring like ‘x|y|z|[0-9_]|…’,
I don’t want to write it twice in a regexp.
And I think it could work with a group matching by use '\1',
but I don’t know why it doesn’t work.
I tried to use a named group matching like
reg = re.compile(r'^(?P<name>c)|(?P=name)$')
and it doesn’t work, too.
>Solution :
This is not how regex backreferences work. Their purpose is to match the same exact sub-string multiple times in a single string.
For example, regex ([ab])\1 will match 'aa' and 'bb' but not 'ab' nor 'ba'.
Until the capturing group 1 is matched, \1 is meaningless.
If you want to avoid repeating yourself, while writing the regex, I suggest to just compose it from a few sub-regexes:
sub_reg = r'c'
reg = re.compile(rf'^{sub_reg}|{sub_reg}$')
This has an additional advantage of improving readability if you give descriptive names to your variables.