Home Why does BeautifulSoup return a NoteType value, even when the element is there?

Questions

Why does BeautifulSoup return a NoteType value, even when the element is there?

January 24, 2024

I’m trying to scrape thumbnail images from a website, but when I request BS to find a specific div class, it returns NONE. I tried this before on a similar website and I managed to to get everything within the desired div class but i’m running into issues here. If you have the time I would be extremely grateful for your advice.

Below is a sample of my code:

from pickle import NONE, TRUE
import requests
from bs4 import BeautifulSoup
import requests.exceptions


localfile = "C:/Users/XXX/Desktop/TapTap Apps/TapTap Page 1"
url = "https://www.taptap.cn/app/"
username = 'XXXX'
password = 'XXXX'
proxy = f"https://{username}:{password}@someproxysite.com"

def webScraper(url, proxy, min, max):
    for x in range(min, max):
        page = requests.get(url + str(x), proxy, timeout=10)                                # Request url and iterate with x
        soup = BeautifulSoup(page.content, 'lxml')
        image = soup.find('div', class_="tap-image-wrapper app-info-board__img")            # Finds the HTML elements that holds the image
        print (image)

webScraper(url, proxy, 12332, 13000)

>Solution :

The proxy configuration is wrong for starters, it should be passed as a dictionary instead.

proxies = {"https": proxy}
page = requests.get(url + str(x), proxies=proxies, timeout=10)

Also in request you can specify the user agent you can try for any but for example for chrome its

user_agent_det = 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/104.0.5112.79 Safari/537.36'
headers = {'User-Agent':user_agent_det}
page = requests.get(url + str(x), headers=headers, proxies=proxies, timeout=10)

And to get the image

def webScraper(url, proxy, min, max):
    proxies = {"https": proxy}
    user_agent_det = 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/104.0.5112.79 Safari/537.36'
    headers = {'User-Agent':user_agent_det}

    for x in range(min, max):
        try:
            page = requests.get(url + str(x), , headers=headers, proxies=proxies, timeout=10)
            if page.status_code == 200:
                soup = BeautifulSoup(page.content, 'lxml')
                image = soup.find('div', class_="tap-image-wrapper app-info-board__img")
                print(image)
            else:
                print(f"Failed to access {url} : {page.status_code}")
        except Exception  as e:
            print(f"Failed : {e}")