I’m creating a regex that will select text between 2 tags but I’m encountering an error: This is my code:
pattern = re.compile(r'<div class="c-listing">(\b[A-Za-z]+\s?){2,3}</div>')
sag = '<div class="c-listing-fight__class">I Hate Regex</div>'
The corresponding match is:
match='<div class="c-listing-fight__class">I Hate Regex<>
At least it finishes the string inside the ‘div class’ which is what I really care about. What I’m assigned to find is:
sag = '<div class="c-listing-fight__class">Lightweight Bout</div>'
However, the iterator in this case only finds up to
<div class="c-listing-fight__class">Lightweight B>
It makes absolutely no sense to me why it is not finishing the string ‘Bout’, can anyone help? Thank you. Also, in my pattern I’ve tried to put the / symbol in square brackets but the result remains the same.
Here is my complete output for reproducibility:
>>> sag = '<div class="c-listing-fight__class">I Hate Regex</div>'
>>> pattern = re.compile(r'<div class="c-listing-fight__class">(\b[A-Za-z]+\s?){2,3}</div>')
>>> match = pattern.finditer(sag)
>>> for i in match:
print(i)
<re.Match object; span=(0, 54), match='<div class="c-listing-fight__class">I Hate Regex<>
>>> sag = '<div class="c-listing-fight__class">Lightweight Bout</div>'
>>> match = pattern.finditer(sag)
>>> for i in match:
print(i)
<re.Match object; span=(0, 58), match='<div class="c-listing-fight__class">Lightweight B>
>Solution :
You are being misled by the representation of the Match object, which is only for debugging use. It only prints out a few characters of the matched string. If you actually print the group objects, you’ll find that it’s working just fine. So pattern.match(sag).group(0) has the whole string.