Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Regex HTML not capturing

I’m creating a regex that will select text between 2 tags but I’m encountering an error: This is my code:

pattern = re.compile(r'<div class="c-listing">(\b[A-Za-z]+\s?){2,3}</div>')
sag = '<div class="c-listing-fight__class">I Hate Regex</div>'

The corresponding match is:

match='<div class="c-listing-fight__class">I Hate Regex<>

At least it finishes the string inside the ‘div class’ which is what I really care about. What I’m assigned to find is:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

sag = '<div class="c-listing-fight__class">Lightweight Bout</div>'

However, the iterator in this case only finds up to

<div class="c-listing-fight__class">Lightweight B>

It makes absolutely no sense to me why it is not finishing the string ‘Bout’, can anyone help? Thank you. Also, in my pattern I’ve tried to put the / symbol in square brackets but the result remains the same.

Here is my complete output for reproducibility:

>>> sag = '<div class="c-listing-fight__class">I Hate Regex</div>'
>>> pattern = re.compile(r'<div class="c-listing-fight__class">(\b[A-Za-z]+\s?){2,3}</div>')
>>> match = pattern.finditer(sag)
>>> for i in match:
    print(i)

    
<re.Match object; span=(0, 54), match='<div class="c-listing-fight__class">I Hate Regex<>
>>> sag = '<div class="c-listing-fight__class">Lightweight Bout</div>'
>>> match = pattern.finditer(sag)
>>> for i in match:
    print(i)

    
<re.Match object; span=(0, 58), match='<div class="c-listing-fight__class">Lightweight B>

>Solution :

You are being misled by the representation of the Match object, which is only for debugging use. It only prints out a few characters of the matched string. If you actually print the group objects, you’ll find that it’s working just fine. So pattern.match(sag).group(0) has the whole string.

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading