Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Having some difficulty finding out how to detect <a href> in Python

from bs4 import BeautifulSoup
import requests

page = requests.get('https://www.capitol.tn.gov/house/members/').text
soup = BeautifulSoup(page, 'html.parser')

table = soup.find('table')
rows = table.find_all('tr')
header = rows[0].find_all('th')
header_text = []

for item in header:
  header_text.append(item.get_text(strip=True))
  
# check header results
print(header_text)

# get rows
for row in rows:
  row_text = []
  a = row.find_all('a')
  td = row.find_all('td')
  for item in td:
    if item:
      row_text.append(item.get_text(strip=True))
    
  # check row results
  if len(row_text) > 0:
    print(row_text)

I’m sorry if this is a stupid question, but I’m having a bit of trouble coming up with how to get the ‘a’s or ‘hrefs’ (aka the emails) to actually appear as the first item in the row. For starters, I’ve tried the insert() method, but it never actually gives me anything.

>Solution :

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

This does the job:

# get rows
for row in rows:
  row_text = []
  a = row.find_all('a')
  td = row.find_all('td')
  # print(td)
  for item in td:
    email = item.find("a", {"class": "email"})
    
    if email != None:
      email = email.get("href")
      row_text.append(email)

    if item:
      row_text.append(item.get_text(strip=True))
    
  # check row results
  if len(row_text) > 0:
    print(row_text)

The code basically checks if any element in a td tag has an a tag in it. If it finds an a tag, it checks if the tag belong so the class email. If it does then it gets the href from the tag and stores it inside a variable by the name email which is later appended to the row_text list.

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading