I’m triying to get the href attributes from a table in this web. I have this code to get all of the links but i want to filter so i only access to the href for ‘Automaticas’ not the ‘Manuales’
# Fetch URL
url = 'http://meteo.navarra.es/estaciones/descargardatos.cfm'
request = urllib2.Request(url)
request.add_header('Accept-Encoding', 'utf-8')
# Response has UTF-8 charset header, and HTML body which is UTF-8 encoded
response = urllib2.urlopen(request)
# Parse with BeautifulSoup
soup = BeautifulSoup(response,'html.parser')
for a in soup.find_all('a',{'href': re.compile(r'descargardatos_estacion.*')}):
estacion = 'http://meteo.navarra.es/estaciones/' + a.attrs.get('href')
print(estacion)
# descarga_csvs(estacion)
The src above for ‘Automaticas’ and ‘Manuales’ are different but i don’t know how to filter them.
>Solution :
You can use
for img in soup.find_all(lambda x: x.name == 'img' and 'automatica.gif' in x['src']):
print(img.next_sibling.next_sibling['href'])
Notes:
soup.find_all(lambda x: x.name == 'img' and 'automatica.gif' in x['src'])
– fetches allimg
nodes that containautomatica.gif
in thesrc
attributeimg.next_sibling.next_sibling['href']
– gets thehref
value of the second sibling of each foundimg
tag.