I have this small bit of a soup tag element that I pulled using Selenium & BeautifulSoup.
<footer>
<p class="tags environment-tags">Environment:
<span class="tag environment-tag">Desert</span>
</p>
<p class="source monster-source">Basic Rules
<span class="page-number">, pg. 334</span>
</p>
</footer>
I am trying to grab the Text from just the p elements, but every time I try it grabs the span as well. So far this is what I tried:
for p in Environment.findAll('p'):
print(p.text)
I have also tried to extract the information using .extract() but that doesn’t seem to work for me.
>Solution :
You can use .contents and access the 0th element:
for tag in soup.find_all("p"):
print(tag.contents[0].strip())
Output:
Environment:
Basic Rules
Or with your attempt, you can remove the <span>‘s using .extract() by:
for tag in soup.select("p span"):
tag.extract()
print(soup.prettify())
Output:
<footer>
<p class="tags environment-tags">
Environment:
</p>
<p class="source monster-source">
Basic Rules
</p>
</footer>