I plan to match the string(one line) with any one of the patterns.
Pattern1: fname lname
Pattern2: lname,fname
Example String(s):
Frank Delo
Delo,Frank
groupdict() Output should return the same for both the strings
{"fname":"Frank",
"lname":"Delo"
}
Here’s what I tried
r1 = "^(?P<fname>[a-zA-Z]+)(?: (?P<lname>[a-zA-Z]+))?$"
r2 = "^(?P=lname),(?P=fname)$"
print(re.match("|".join([r1,r2]), "Frank Delo").groupdict()) # Works fine
print(re.match("|".join([r1,r2]), "Delo,Frank").groupdict()) # Doesn't match
Can we not use named group references after ‘|’ operator?
Also, please note that I don’t want to compile the patterns seperately
>Solution :
There are two issues:
-
(?P=lname)is a backreference, which means it matches whatever(?P<lname>)matched, which is not what you want, as this is intended to cover the case wherer1did not match at all. -
To fix the above, you’d want to use
(?P<lname>)again, so that whichever alternative regex applies (eitherr1orr2), you’d define that named group. Howeverredoes not support that. The good news is that the more richregexpackage does support it.
So then we get:
import regex as re
r1 = "^(?P<fname>[a-zA-Z]+) (?P<lname>[a-zA-Z]+)$"
r2 = "^(?P<lname>[a-zA-Z]+),(?P<fname>[a-zA-Z]+)$"
r = "|".join([r1,r2])
print(re.match(r, "Frank Delo").groupdict()) # Works fine
print(re.match(r, "Delo,Frank").groupdict()) # Works fine too