I’m new to regex and I’m facing problems with the logical "or" statement. I want to find the following pattern (numbers that are connected by either "and" or "to").
I tried the following:
string = 'We need Number 3 and 41'
pattern = 'number \d+ (and|to) \d+'
print(re.findall(pattern, string.lower()))
But what I get is not the entire string but only ['and']. Any ideas where the error is?
>Solution :
The (and|to) is a capturing group. When you print the result of re.findall, the contents of any capturing groups are printed:
>>> import re
>>> string = 'We need Number 3 and 41'
>>> pattern = 'number \d+ (and|to) \d+'
>>> re.findall(pattern, string.lower())
['and']
The function re.findall only prints the full match if there are no capturing groups. Now change to a non-capturing group and the entire match is printed:
>>> pattern = 'number \d+ (?:and|to) \d+'
>>> re.findall(pattern, string.lower())
['number 3 and 41']
If you want the entire string, add anchors and .* to extend the match to the start and end of the string:
>>> pattern = '^.*number \d+ (?:and|to) \d+.*$'
>>> re.findall(pattern, string.lower())
['we need number 3 and 41']
Note: The grouping with (?:and|to) is required otherwise the | applies to the entire left hand and right hand side of the pattern. So you are correct in using () — you just need to have the (?:) form to get the match printed as you expect.