I am learning regex using Python and am a little confused by this tutorial I am following. Here is the example:
rand_str_2 = "doctor doctors doctor's"
# Match doctor doctors or doctor's
regex = re.compile("[doctor]+['s]*")
matches = re.findall(regex, rand_str_2)
print("Matches :", len(matches))
I get 3 matches
When I do the same thing but replace the * with a ? I still get three matches
regex = re.compile("[doctor]+['s]?")
When I look into the documentation I see that the * finds 0 or more and ? finds 0 or 1
My understanding of this is that it would not return "3 matches" because it is only looking for 0 or 1.
Can someone offer a better understanding of what I should expect out of these two Quantifiers?
Thank you
>Solution :
The reason this happens is because of the grouping in that specific expression. The square brackets are telling whatever is reading the expression to "match any single character in this list". This means that it is looking for either a ' or a s to satisfy the expression.
Now you can see how the quantifier effects this. Doing ['s]? is telling the pattern to "match ' or s between 0 and 1 times, as many times as possible", so it matches the ' and stops right before the s.
Doing ['s]* on the other hand is telling it to "match ' or s between 0 and infinity, as many times as possible". In this case it will match both the ' and the s because they’re both in the list of characters it’s trying to match.
I hope this makes sense. If not, feel free to leave a comment and I’ll try my best to clarify it.