Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Remove all text from a html node using regex

Is it possible to remove all text from HTML nodes with a regex? This very simple case seems to work just fine:

import htmlmin

html = """
<li class="menu-item">
  <p class="menu-item__heading">Totopos</p>
  <p>Chips and molcajete salsa</p>
  <p class="menu-item__details menu-item__details--price">
    <strong>
      <span class="menu-item__currency"> $ </span>
      4
    </strong>
  </p>
</li>
"""

print(re.sub(">(.*?)<", ">\1<", htmlmin.minify(html)))

I tried to use BeautifulSoup but I cannot figure out how to make it work. Using the following code example is not quite correct since it is leaving "4" in as text.

soup = BeautifulSoup(html, "html.parser")
for n in soup.find_all(recursive=True):
    print(n.name, n.string)
    if n.string:
        n.string = ""
print(minify(str(soup)))

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

>Solution :

try to use text=True when you call find_all and call extract() on element to remove it:

from bs4 import BeautifulSoup

html = '''
<li class="menu-item">
  <p class="menu-item__heading">Totopos</p>
  <p>Chips and molcajete salsa</p>
  <p class="menu-item__details menu-item__details--price">
    <strong>
      <span class="menu-item__currency"> $ </span>
      4
    </strong>
  </p>
</li>
'''

soup = BeautifulSoup(html, 'html.parser')
for element in soup.find_all(text=True):
    element.extract()

print(soup.prettify())

the output will be in this case:

<li class="menu-item">
 <p class="menu-item__heading">
 </p>
 <p>
 </p>
 <p class="menu-item__details menu-item__details--price">
  <strong>
   <span class="menu-item__currency">
   </span>
  </strong>
 </p>
</li>
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading