Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Beautiful Soup: 'NoneType' object has no attribute 'text'

I got this code to work to scrape a table on a webpage, which I’m very happy with. However, on a rare occasion, a title might miss a ‘genre’ or an ‘image URL’ field. As soon as the scraper hits an item in the list that has a missing value it discontinues and gives me the 'NoneType' object has no attribute 'text' error.

How can I amend this code for it to continue scraping and just pass a N/A value for that specific column if a value is missing.

Your help is much appreciated!

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

from bs4 import BeautifulSoup
import pandas as pd

# Send a GET request to the URL
url = "https://www.hebban.nl/rank"
response = requests.get(url,headers={'user-agent':'Mozilla/5.0'})

# Parse the HTML content using Beautiful Soup
soup = BeautifulSoup(response.content, 'html.parser')

# Find the book titles, authors, and image url links
data = []
books = soup.find_all('div', class_='item')
for book in books:
    rank = book.h3.text.strip()
    title = book.find('a', class_='neutral').text.strip()
    author = book.find('span', class_='author').text.strip()
    genre = book.find('a', class_='btn btn4 yf-genre').text.strip()

    ##img_url = book.img.get('data-src')
    
    print(rank + ' by ' + author)
    ##print('Image URL: ' + img_url)

    data.append({'rank': rank, 'author': author, 'title': title, 'genres': genre})

# Create a dataframe and save it to a csv
df = pd.DataFrame (data)
df.to_csv('hebbanexport.csv', index=False)

>Solution :

Simply check if element you try to find is available, before apply any method, if not set to None or any value you like to use.

genre = book.find('a', class_='btn btn4 yf-genre').text.strip() if book.find('a', class_='btn btn4 yf-genre') else None

You could also use a function to check:

def check_if_element_is_available(e):
    if e:
        return e.text.strip()
    else:
        return None

for book in books:
    rank = check_if_element_is_available(book.h3)
    title = check_if_element_is_available(book.find('a', class_='neutral'))
    author = check_if_element_is_available(book.find('span', class_='author'))
    genre = check_if_element_is_available(book.find('a', class_='btn btn4 yf-genre'))
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading