Home Scrape Text and save File with Bold Text Intact?

Questions

Scrape Text and save File with Bold Text Intact?

February 12, 2022

I am very new to Python and webscraping. I have tried to search for an answer, but cannot find it. It might be because I don’t know the terminology to ask the right question.

I am trying to web scrape using python – beautiful soup in order to extract the English transliterations of verb tables from a website (https://www.pealim.com/dict/28-lavo/) that conjugates modern Hebrew verbs. I am then trying to save the text to a txt file. The sticking point is I am trying to get the bold formatting tag to remain intact during the scraping/saving to file, because they are important to know where the stress falls in the word.

Here is an example of what I am getting:
ba’im

And here is what I would like:
ba’im

I’m including an image because when I post the HTML code, it’s automatically rendering it:

What I’m looking to do

By looking around the forums, I have come up with code gets me close to what I need, but I cannot figure out how to get the bold tags in there as well.

import requests
from bs4 import BeautifulSoup as bs

#load webpage content
r = requests.get("https://www.pealim.com/dict/28-lavo/")

#Convert to a soup object
soup = bs(r.content)

#Find the transliterations from the verb tables with the stress bolded
mine = [element.text for element in soup.find_all("div", "transcription")]

#Save to file
with open("lavo.txt", "w") as output:
    for i in mine:
        output.write('%s\n' % i)

>Solution :

You can use .contents property, cast it to string and join it. For example:

import requests
from bs4 import BeautifulSoup as bs

# load webpage content
r = requests.get("https://www.pealim.com/dict/28-lavo/")

# Convert to a soup object
soup = bs(r.content, "html.parser")

# Find the transliterations from the verb tables with the stress bolded
mine = [
    "".join(map(str, element.contents))
    for element in soup.find_all("div", "transcription")
]

with open("lavo.txt", "w") as output:
    for i in mine:
        output.write("%s\n" % i)

Saves lavo.txt:

b<b>a</b>
ba'<b>a</b>
ba'<b>i</b>m
ba'<b>o</b>t
b<b>a</b>ti
b<b>a</b>nu
b<b>a</b>ta
b<b>a</b>t
bat<b>e</b>m

...

beautifulsoup

byMR

Published February 12, 2022

Add a comment

Long For Loop Excecution Time

byMR

February 13, 2022

Questions

calling a double from object seems to return the possition not value of the double

byMR

February 13, 2022

Questions

Subquery help Postgre

byMR

February 13, 2022

Questions

Create file from output on single line

byMR

February 13, 2022

Questions

mysql-connector won't connect

byMR

February 13, 2022

Questions

printf statement including quotes and escape sequence

byMR

February 13, 2022

Scrape Text and save File with Bold Text Intact?

MEDevel.com: Open-source for Healthcare and Education

>Solution :

Like this:

Leave a ReplyCancel reply

Read more

Long For Loop Excecution Time

calling a double from object seems to return the possition not value of the double

Subquery help Postgre

Create file from output on single line

mysql-connector won't connect

printf statement including quotes and escape sequence

Keep Up to Date with the Most Important News

Scrape Text and save File with Bold Text Intact?

MEDevel.com: Open-source for Healthcare and Education

>Solution :

Share this:

Like this:

Leave a ReplyCancel reply

Keep Up to Date with the Most Important News

Read more

Long For Loop Excecution Time

calling a double from object seems to return the possition not value of the double

Subquery help Postgre

Create file from output on single line

mysql-connector won't connect

printf statement including quotes and escape sequence

Discover more from Dev solutions