Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

How to select two td and output as single line with bs4?

I want to fetch some data and I have a hard time selecting two td and having the output on the same line where they belong.

Sample of the HTML:

<tr>
<td class ='verseNumCell'>
፳
</td>
<td class ='verseConentCell'>
ወትቤሎን ኢትስምያኒ ኖሔሚን ስምያኒ መራር እስመ መረርኩ ፈድፋደ ወብዙኀ ።
</td>
</tr>
<tr>
<td class ='verseNumCell'>
፳፩
</td>
<td class ='verseConentCell'>
አንሰ ምልእትየ ሖርኩ ወዕራቅየ አግብአኒ <span class='divineWord'>እግዚአብሔር</span> ለምንት ትብላኒ ኖሔሚን እንዘ <span class='divineWord'>እግዚአብሔር</span> አኅሰረኒ ወፈድፋደ አሕመመኒ ።
</td>
</tr>
<tr>

What I did:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

import bs4
import requests
import re

url = "https://www.ethiopicbible.com/books/%E1%8A%A6%E1%88%AA%E1%89%B5-%E1%8B%98%E1%8D%8D%E1%8C%A5%E1%88%A8%E1%89%B5-1"
parameters = {}
response = requests.get(url, params=parameters)
soup = bs4.BeautifulSoup(response.text, "html.parser")
element_list = soup.find("div", class_="geezBibleChapterContainer").find_all("td")

for element in element_list:
    text = element.get_text()
    text = os.linesep.join([s for s in text.splitlines() if s])
    if not re.match(r'^\s*$', text):
        print(text)

My output:

፳
ወትቤሎን ኢትስምያኒ ኖሔሚን ስምያኒ መራር እስመ መረርኩ ፈድፋደ ወብዙኀ ።
፳፩
አንሰ ምልእትየ ሖርኩ ወዕራቅየ አግብአኒ እግዚአብሔር</span> ለምንት ትብላኒ ኖሔሚን እንዘ

What I try to get:

፳ ወትቤሎን ኢትስምያኒ ኖሔሚን ስምያኒ መራር እስመ መረርኩ ፈድፋደ ወብዙኀ ።
፳፩ አንሰ ምልእትየ ሖርኩ ወዕራቅየ አግብአኒ እግዚአብሔር</span> ለምንት ትብላኒ ኖሔሚን እንዘ

Should I select the td’s in separate "soups"?

>Solution :

Instead of selecting the cells simply select each row and use get_text(separator=' ',strip=True):

for row in soup.select('div.geezBibleChapterContainer tr'):
    print(row.get_text(' ',strip=True))

What leads to:

፩ በቀዳሚ ገብረ እግዚአብሔር ሰማየ ወምድረ ።
፪ ወምድርሰ ኢታስተርኢ ወኢኮነት ድሉተ ወጽልመት መልዕልተ ቀላይ ወመንፈሰ እግዚአብሔር ይጼልል መልዕልተ ማይ ።
፫ ወይቤ እግዚአብሔር ለይኩን ብርሃን ወኮነ ብርሃን ።
፬ ወርእዮ እግዚአብሔር ለብርሃን ከመ ሠናይ ወፈለጠ እግዚአብሔር ማእከለ ብርሃን ወማእከለ ጽልመት ።
፭ ወሰመዮ እግዚአብሔር ለብርሃን ዕለተ ወለጽልመት ሌሊተ ወኮነ ሌሊተ ወጸብሐ ወኮነ መዓልተ ፩ ።
፮ ወይቤ እግዚአብሔር ለይኩን ጠፈር ማእከለ ማይ ከመ ይፍልጥ ማእከለ ማይ ወኮነ ከማሁ ።
፯ ወገብረ እግዚአብሔር ጠፈረ ወፈለጠ እግዚአብሔር ማእከለ ማይ ዘታሕተ ጠፈር ወማእከለ ማይ ዘመልዕልተ ጠፈር ።
፰ ወሰመዮ እግዚአብሔር ለውእቱ ጠፈር ሰማየ ወርእየ እግዚአብሔር ከመ ሠናይ ወኮነ ሌሊተ ወጸብሐ ወኮነ ካልእተ ዕለተ ።
Example
import requests
import bs4

url = "https://www.ethiopicbible.com/books/%E1%8A%A6%E1%88%AA%E1%89%B5-%E1%8B%98%E1%8D%8D%E1%8C%A5%E1%88%A8%E1%89%B5-1"
parameters = {}
response = requests.get(url, params=parameters)
soup = bs4.BeautifulSoup(response.text, "html.parser")

for row in soup.select('div.geezBibleChapterContainer tr'):
    print(row.get_text(separator=' ',strip=True))
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading