For some reason, when I re.compile
a list of unioned regex patterns it seems as though some patterns work and some do not. Can’t figure out the issue here though. Any guidance appreciated.
import re
creditcard_pattern = re.compile(r'''(
(CREDIT\s?CA?RD)|
((CARD|\bCC\b).*PA?YME?N?T?)|
(APPLECARD)|
WELLS FARGO.*(CARD|CC)|
(CITI.*(C?R?E?D?I?T CA?R?D))|
(CAPITAL ONE)|
AMERICAN EXPRESS|
(DISCOVER.*(?!.*1BANK))|
AMER?I?C?A?N?\s?E?XP?R?E?S?S?|
CHASE.*CARD|
(BA?N?K.*AME?RI?C?A?.*PMT)|
AMEX|
CITICORP CHOICE|
CITI (CARD|AUTO|PAYMENT)|
VISA PLATINUM|
BARCLAY.*CARD|
USAA FSB.*ONLINE PMT|
CITIBANK.*ONLINE PMT
)
''', flags=re.I | re.X )
Testing:
if creditcard_pattern.search('CARD PYMT'):
print('found')
#>> found
if creditcard_pattern.search('BARCLAY CARD'):
print('found')
#>> found
if creditcard_pattern.search('WELLS FARGO CARD'):
print('found')
#>> not found
if creditcard_pattern.search('CAPITAL ONE'):
print('found')
#>> not found
When testing the patterns in https://regexr.com/ my patterns seem to work as expected…
>Solution :
The documentation for re.X states:
Whitespace within the pattern is ignored, except when in a character
class, or when preceded by an unescaped backslash, or within tokens
like *?, (?: or (?P<…>.
So you could escape the single space in ‘CAPITAL ONE’, the corresponding line in your regex becomes:
(CAPITAL\ ONE)|