Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Find only certain elements in table with Beatiful soup

I’m triying to get the href attributes from a table in this web. I have this code to get all of the links but i want to filter so i only access to the href for ‘Automaticas’ not the ‘Manuales’

# Fetch URL
url = 'http://meteo.navarra.es/estaciones/descargardatos.cfm'

request = urllib2.Request(url)
request.add_header('Accept-Encoding', 'utf-8')

# Response has UTF-8 charset header, and HTML body which is UTF-8 encoded
response = urllib2.urlopen(request)

# Parse with BeautifulSoup
soup = BeautifulSoup(response,'html.parser')

for a in soup.find_all('a',{'href': re.compile(r'descargardatos_estacion.*')}):
    estacion = 'http://meteo.navarra.es/estaciones/' + a.attrs.get('href')
    print(estacion)
    # descarga_csvs(estacion)

The src above for ‘Automaticas’ and ‘Manuales’ are different but i don’t know how to filter them.

enter image description here

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

>Solution :

You can use

for img in soup.find_all(lambda x: x.name == 'img' and 'automatica.gif' in x['src']):
    print(img.next_sibling.next_sibling['href'])

Notes:

  • soup.find_all(lambda x: x.name == 'img' and 'automatica.gif' in x['src']) – fetches all img nodes that contain automatica.gif in the src attribute
  • img.next_sibling.next_sibling['href'] – gets the href value of the second sibling of each found img tag.
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading