Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

BeautifulSoup — extracting both "td" objects without class (_class = None or False) and other class types

I am trying to scrap from a website that has td objects. Some of those have no class, which I can extract with

object.find_all("td", class_=None)

And others have a class called sem_dados, which I can extract using

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

object.find_all("td", class_="sem_dados")

Main issue is: I can’t do both at the same time. For instance,

object.find_all("td", class_=[None, "sem_dados"])

will not return the td objects that have no class. This seems to be a problem with the None or False behavior within a list, since

object.find_all("td", class_=[None])

Will also return an empty list.

Anyone knows how to change the syntax so I can call both together? The ordering of the extraction would be important. I could manually reorder, but I believe there must be a syntax to do what I am trying to do.

Tried many different syntaxes, but still couldn’t get something working.

>Solution :

Maybe you can use custom lambda function:

from bs4 import BeautifulSoup

html_doc = '''\
<td class="sem_dados">I want this 1</td>
<td class="other">I don't want this</td>
<td>I want this 2</td>'''

soup = BeautifulSoup(html_doc, 'html.parser')

print(soup.find_all('td', class_=lambda c: not c or 'sem_dados' == c))

Prints:

[<td class="sem_dados">I want this 1</td>, <td>I want this 2</td>]
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading