Home Extract Parent Text without Children Text; Parsing HTML

Questions

Extract Parent Text without Children Text; Parsing HTML

January 20, 2022

I have this small bit of a soup tag element that I pulled using Selenium & BeautifulSoup.

<footer>
    <p class="tags environment-tags">Environment:
      <span class="tag environment-tag">Desert</span>
    </p>
    <p class="source monster-source">Basic Rules
      <span class="page-number">, pg. 334</span>
    </p>
</footer>

I am trying to grab the Text from just the p elements, but every time I try it grabs the span as well. So far this is what I tried:

for p in Environment.findAll('p'):
    print(p.text)

I have also tried to extract the information using .extract() but that doesn’t seem to work for me.

>Solution :

You can use .contents and access the 0th element:

for tag in soup.find_all("p"):
    print(tag.contents[0].strip())

Output:

Environment:
Basic Rules

Or with your attempt, you can remove the <span>‘s using .extract() by:

for tag in soup.select("p span"):
    tag.extract()

print(soup.prettify())

Output:

<footer>
 <p class="tags environment-tags">
  Environment:
 </p>
 <p class="source monster-source">
  Basic Rules
 </p>
</footer>

beautifulsoup

byMR

Published January 20, 2022

Add a comment

Some buttons work multiple times, some buttons don't work multiple times

byMR

January 20, 2022

Questions

Git clean -x option

byMR

January 20, 2022

Questions

Write a list of dictionaries (with varying keys) to one .csv file?

byMR

January 20, 2022

Questions

onclick running when page loads

byMR

January 20, 2022

Questions

Got lower accuracy while training Random Forest with important features

byMR

January 20, 2022

Questions

How to capture hyphen, space or none and ignore case?

byMR

January 20, 2022

Extract Parent Text without Children Text; Parsing HTML

MEDevel.com: Open-source for Healthcare and Education

>Solution :

Like this:

Leave a ReplyCancel reply

Read more

Some buttons work multiple times, some buttons don't work multiple times

Git clean -x option

Write a list of dictionaries (with varying keys) to one .csv file?

onclick running when page loads

Got lower accuracy while training Random Forest with important features

How to capture hyphen, space or none and ignore case?

Keep Up to Date with the Most Important News

Extract Parent Text without Children Text; Parsing HTML

MEDevel.com: Open-source for Healthcare and Education

>Solution :

Share this:

Like this:

Leave a ReplyCancel reply

Keep Up to Date with the Most Important News

Read more

Some buttons work multiple times, some buttons don't work multiple times

Git clean -x option

Write a list of dictionaries (with varying keys) to one .csv file?

onclick running when page loads

Got lower accuracy while training Random Forest with important features

How to capture hyphen, space or none and ignore case?

Discover more from Dev solutions