Home Finding text from html using BeautifulSoup

Questions

Finding text from html using BeautifulSoup

May 25, 2022

I have the following .html:

<li class="print text">
                            <span><em class="time">
                                    <div class="time">1.29 s</div>
                                </em><em class="status">passed</em>This is the text I want to get</span>

I need to get only the text that is outside all of the other tags (text is: This is the text I want to get).

I was trying to use this piece of code:

for el in doc.find_all('li', attrs={'class': 'print text'}):
    print(el.get_text())

But unfortunatelly it prints everything including the em tags etc.

Is there any way to do this?

Thank you!!

>Solution :

You could go with find(text=True, recursive=False) to get your goal.

Example

from bs4 import BeautifulSoup
soup='''<li class="print text">
        <span><em class="time">
                <div class="time">1.29 s</div>
            </em><em class="status">passed</em>This is the text I want to get</span>'''

soup=BeautifulSoup(soup)

soup.find('li',class_='print text').span.find(text=True, recursive=False)

Output

This is the text I want to get

If there are multiple span in your li you could go with:

from bs4 import BeautifulSoup
soup='''<li class="print text">
        <span><em class="time">
                <div class="time">1.29 s</div>
            </em><em class="status">passed</em>This is the text I want to get</span>
            <span><em class="time">
                <div class="time">1.50 s</div>
            </em><em class="status">passed</em>This is the text I want to get too</span>'''

soup=BeautifulSoup(soup)

for e in soup.select('li.print.text span'):
    print(e.find(text=True, recursive=False))

Output

This is the text I want to get
This is the text I want to get too

byMR

Published May 25, 2022

Add a comment

Create a list with string + range

byMR

May 25, 2022

Questions

Remove Name Notepad file if doesnt exist in AD through powershell

byMR

May 25, 2022

Questions

How to sort a python list of arbitrary strings like Excel Column Names?

byMR

May 25, 2022

Questions

mediation::mediate does not support more than two levels per model

byMR

May 25, 2022

Questions

Web-Scraping using R (I want to extract some table like data from a website)

byMR

May 25, 2022

Questions

how to use this map function in React to call url with images name

byMR

May 25, 2022

Finding text from html using BeautifulSoup

MEDevel.com: Open-source for Healthcare and Education

>Solution :

Example

Output

Output

Like this:

Leave a ReplyCancel reply

Read more

Create a list with string + range

Remove Name Notepad file if doesnt exist in AD through powershell

How to sort a python list of arbitrary strings like Excel Column Names?

mediation::mediate does not support more than two levels per model

Web-Scraping using R (I want to extract some table like data from a website)

how to use this map function in React to call url with images name

Keep Up to Date with the Most Important News

Finding text from html using BeautifulSoup

MEDevel.com: Open-source for Healthcare and Education

>Solution :

Example

Output

Output

Share this:

Like this:

Leave a ReplyCancel reply

Keep Up to Date with the Most Important News

Read more

Create a list with string + range

Remove Name Notepad file if doesnt exist in AD through powershell

How to sort a python list of arbitrary strings like Excel Column Names?

mediation::mediate does not support more than two levels per model

Web-Scraping using R (I want to extract some table like data from a website)

how to use this map function in React to call url with images name

Discover more from Dev solutions