Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

BS4 – Replacing text content, preserving tags

I have an HTML document that uses the text-styling style attribute to change case. When I see that style, I’d like to change all text for which that tag applies, retaining the HTML tags.

I have a partial solution that replaces the tag entirely. The approach that seems like it ought to be correct gives me AttributeError: 'NoneType' object has no attribute 'next_element'

Example:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

from bs4 import BeautifulSoup, NavigableString, Tag
import re

html = '''
<div style="text-transform: uppercase;">
    Foo0
    <font>Foo0</font>
    <div>Foo1
        <div>Foo2</div>
    </div>
</div>
'''
upper_patt = re.compile('(?i)text-transform:\s*uppercase')

# works, but replaces all text, removing the HTML tags
for node in soup.find_all(attrs={'style': upper_patt}):
    node.replace_with(node.text.upper())

# does not work, throws AttributeError error
soup = BeautifulSoup(html, "html.parser")
for node in soup.find_all(attrs={'style': upper_patt}):
    for txt in node.strings:
        txt.replace_with(txt.upper())

>Solution :

Seems like you want to change the inner text to uppercase for all the children of an element with text-transform: uppercase.

Instead of altering the result of find_all, loop over the children text with node.findChildren(text=True) of the result, and use replace_with() to change the text:

from bs4 import BeautifulSoup, NavigableString, Tag
import re

html = '''
<div style="text-transform: uppercase;">
    Foo0
    <font>Foo0</font>
    <div>Foo1
        <div>Foo2</div>
    </div>
</div>
'''
upper_patt = re.compile('(?i)text-transform:\s*uppercase')
soup = BeautifulSoup(html, "html.parser")

for node in soup.find_all(attrs={'style': upper_patt}):
    for child in node.findChildren(recursive=True, text=True):
        child.replace_with(child.text.upper())

print(soup)

Prints:

<div style="text-transform: uppercase;">
    FOO0
    <font>FOO0</font>
<div>FOO1
        <div>FOO2</div>
</div>
</div>
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading