Find only certain elements in table with Beatiful soup

I’m triying to get the href attributes from a table in this web. I have this code to get all of the links but i want to filter so i only access to the href for ‘Automaticas’ not the ‘Manuales’

# Fetch URL
url = 'http://meteo.navarra.es/estaciones/descargardatos.cfm'

request = urllib2.Request(url)
request.add_header('Accept-Encoding', 'utf-8')

# Response has UTF-8 charset header, and HTML body which is UTF-8 encoded
response = urllib2.urlopen(request)

# Parse with BeautifulSoup
soup = BeautifulSoup(response,'html.parser')

for a in soup.find_all('a',{'href': re.compile(r'descargardatos_estacion.*')}):
    estacion = 'http://meteo.navarra.es/estaciones/' + a.attrs.get('href')
    print(estacion)
    # descarga_csvs(estacion)

The src above for ‘Automaticas’ and ‘Manuales’ are different but i don’t know how to filter them.

>Solution :

You can use

for img in soup.find_all(lambda x: x.name == 'img' and 'automatica.gif' in x['src']):
    print(img.next_sibling.next_sibling['href'])

Notes:

soup.find_all(lambda x: x.name == 'img' and 'automatica.gif' in x['src']) – fetches all img nodes that contain automatica.gif in the src attribute
img.next_sibling.next_sibling['href'] – gets the href value of the second sibling of each found img tag.

Related

Leave a ReplyCancel reply