Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Replacing a bs4 element with a string

So I have a HTML document, where I want to add HTML anchor link tags so that I can easily go to a certain part of a webpage.

The first step is to find all divs that need to replaced. Secondly, an anchor link tag needs to be added, based on the text that is within the div. My code looks as follows:

from bs4 import BeautifulSoup
path= "/text.html"

with open(path) as fp:
    soup = BeautifulSoup(fp, 'html.parser')

mydivs = soup.find_all("p", {"class": "tussenkop"})
    
for div in mydivs:
    if "Artikel" in div.getText():
        string = div.getText().split()[1]
        div_id = f"""<a id="{string}"></a>{div}"""
        full =f"{div_id}{div}"
        html_soup = BeautifulSoup(full, 'html.parser')
        div = html_soup

A div looks as follows:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

<p class="tussenkop"><strong class="tussenkop_vet">Artikel 7.37 text text text</strong></p>

After adding the anchor tag it becomes:

<a id="7.37"></a><p class="tussenkop"><strong class="tussenkop_vet">Artikel 10.6 Inwerkingtreding</strong></p><p class="tussenkop"><strong class="tussenkop_vet">Artikel 7.37 text text text</strong></p>

But the problem is, div is not replaced by the new div. How should I correct this? Or is there another way to insert an anchor tag?

>Solution :

I’m not quite sure what your expected output to look like, but BeautifulSoup has methods to create new tags and attributes, and insert them into the soup object.

from bs4 import BeautifulSoup

fp = '<p class="tussenkop"><strong class="tussenkop_vet">Artikel 7.37 text text text</strong>'

soup = BeautifulSoup(fp, 'html.parser')
print('soup before: ', soup)

mydivs = soup.find_all("p", {"class": "tussenkop"})
    
for div in mydivs:
    if "Artikel" in div.getText():
        a_string = div.getText().split()[1]
        new_tag = soup.new_tag("a")
        new_tag['id'] = f'{a_string}'
        div.insert_before(new_tag)
        
print('soup after: ', soup)

Output:

soup before:  <p class="tussenkop"><strong class="tussenkop_vet">Artikel 7.37 text text text</strong></p>
soup after:  <a id="7.37"></a><p class="tussenkop"><strong class="tussenkop_vet">Artikel 7.37 text text text</strong></p>
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading