source:
html="""<!-- End Ezoic - under page title - under_page_title -->
</div> <!-- LEFTCOL END -->
<p> The first sentence is sentence A.
Sentence B has information pertaining about sentence A.
<br><br>Below here is irrelevant..."""
When I input
import re
match_results = re.search("The first sentence.*", html, re.IGNORECASE)
print(match_results.group(0))
It returns
The first sentence is sentence A.
How do I also get the following sentence so that the expected output is
The first sentence is sentence A. Sentence B has information pertaining about sentence A.
>Solution :
Since there isnt any clear pattern to the data, you can get the second line by matching till the first < with "The first sentence[^<]*".
import re
html="""<!-- End Ezoic - under page title - under_page_title -->
</div> <!-- LEFTCOL END -->
<p> The first sentence is sentence A.
Sentence B has information pertaining about sentence A.
<br><br>Below here is irrelevant..."""
match_results = re.search("The first sentence[^<]*", html, re.IGNORECASE)
print(match_results.group(0))
#The first sentence is sentence A.
#Sentence B has information pertaining about sentence A.