Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Return string of following sentence after re.search on the first one

source:

html="""<!-- End Ezoic - under page title - under_page_title -->
</div> <!-- LEFTCOL END -->
<p> The first sentence is sentence A.
Sentence B has information pertaining about sentence A.
<br><br>Below here is irrelevant..."""

When I input

import re
match_results = re.search("The first sentence.*", html, re.IGNORECASE)
print(match_results.group(0))

It returns

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

 The first sentence is sentence A.

How do I also get the following sentence so that the expected output is

The first sentence is sentence A. Sentence B has information pertaining about sentence A.

>Solution :

Since there isnt any clear pattern to the data, you can get the second line by matching till the first < with "The first sentence[^<]*".

import re

html="""<!-- End Ezoic - under page title - under_page_title -->
</div> <!-- LEFTCOL END -->
<p> The first sentence is sentence A.
Sentence B has information pertaining about sentence A.
<br><br>Below here is irrelevant..."""

match_results = re.search("The first sentence[^<]*", html, re.IGNORECASE)
print(match_results.group(0))

#The first sentence is sentence A.
#Sentence B has information pertaining about sentence A.
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading