Home BeautifulSoup — extracting both "td" objects without class (_class = None or False) and other class types

Questions

BeautifulSoup — extracting both "td" objects without class (_class = None or False) and other class types

byMR

February 27, 2023

I am trying to scrap from a website that has td objects. Some of those have no class, which I can extract with

object.find_all("td", class_=None)

And others have a class called sem_dados, which I can extract using

object.find_all("td", class_="sem_dados")

Main issue is: I can’t do both at the same time. For instance,

object.find_all("td", class_=[None, "sem_dados"])

will not return the td objects that have no class. This seems to be a problem with the None or False behavior within a list, since

object.find_all("td", class_=[None])

Will also return an empty list.

Anyone knows how to change the syntax so I can call both together? The ordering of the extraction would be important. I could manually reorder, but I believe there must be a syntax to do what I am trying to do.

Tried many different syntaxes, but still couldn’t get something working.

>Solution :

Maybe you can use custom lambda function:

from bs4 import BeautifulSoup

html_doc = '''\
<td class="sem_dados">I want this 1</td>
<td class="other">I don't want this</td>
<td>I want this 2</td>'''

soup = BeautifulSoup(html_doc, 'html.parser')

print(soup.find_all('td', class_=lambda c: not c or 'sem_dados' == c))

Prints:

[<td class="sem_dados">I want this 1</td>, <td>I want this 2</td>]

beautifulsoup

byMR

Published February 27, 2023

Add a comment

Why does docker build fail in a newly-initialized git repo?

byMR

February 27, 2023

Questions

getopt_long setting optstring[0] to '+'

byMR

February 27, 2023

Questions

Minifiers not merging selectors of same rules

byMR

February 27, 2023

Questions

How do I reference people profiles while inserting text in Google Doc?

byMR

February 27, 2023

Questions

Reclassify a raster tif to unique values

byMR

February 27, 2023

Questions

make a sound from live while saving as a file

byMR

February 27, 2023

BeautifulSoup — extracting both "td" objects without class (_class = None or False) and other class types