Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

How to get required text from the selected html content

from bs4 import BeautifulSoup
import re
html_content = """<div class='ui very padded vertical segment'>
<div class='ui basic clearing segment' style='margin: 0; padding: 1em 0'>
<h4 class='ui header'>
Description
</h4>
<p>Please bring the failure blade to cabin.</p>
</div>
<div class='column'>
<h4 class='ui header'>
Owner Information
</h4>
<div class='ui list'>
<div class='item'>
<i class='grey user icon'></i>
<div class='content'>No Owner Specified</div>
</div>
</div>
</div>"""

work_order_soup = BeautifulSoup(html_content,"html.parser")
find_description = work_order_soup.find(re.compile("^h[1-6]$"), text=re.compile("Description", re.IGNORECASE))

parent_div_description = find_description.find_parent("div")
print(parent_div_description.text)

Without finding the p tag I need to get the text from the parent div. I need to actually get rid of Description from the text. I have already find the description using find_description.
Required solution: Please bring the failure blade to cabin.

>Solution :

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

Remove the <h*> tag from the parent and get the text:

import re

from bs4 import BeautifulSoup

html_content = """<div class='ui very padded vertical segment'>
<div class='ui basic clearing segment' style='margin: 0; padding: 1em 0'>
<h4 class='ui header'>
Description
</h4>
<p>Please bring the failure blade to cabin.</p>
</div>
<div class='column'>
<h4 class='ui header'>
Owner Information
</h4>
<div class='ui list'>
<div class='item'>
<i class='grey user icon'></i>
<div class='content'>No Owner Specified</div>
</div>
</div>
</div>"""

work_order_soup = BeautifulSoup(html_content, "html.parser")
find_description = work_order_soup.find(
    re.compile("^h[1-6]$"), string=re.compile("Description", re.IGNORECASE)
)

parent = find_description.parent
find_description.extract()

print(parent.get_text(strip=True))

Prints:

Please bring the failure blade to cabin.
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading