Python BeautifulSoup Span Scraping

June 5, 2023

I am trying to scrape fields within a Span ID, but the value is not as simple as using find and taking the text from a span.

Below is the HTML from the webpage.
HTML

I am trying to print "B0C4YKLXPQ"

This gets me the

Below are all attempts that failed.

- page_soup.find("div", {"id": "twisterContainer"}).find_all("data-asin")

- page_soup.find("div", {"id": "twisterContainer"}).find("span", {"id": "fitRecommendationsSection"}).span["data-asin"]

- page_soup.find("div", {"id": "twisterContainer"}).find("span", {"id": "fitRecommendationsSection"}).find_all("data-asin")

- page_soup.find("div", {"id": "twisterContainer"}).find_all("data-asin")

- page_soup.find("div", {"id": "twisterContainer"}).find_all(["data-asin"])

>Solution :

The following code has good chances of working, unless your IP has been blacklisted by Amazon for some various reasons, like too many scraping attempts:

import requests
from bs4 import BeautifulSoup as bs

headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/114.0.0.0 Safari/537.36'
}

url = 'https://www.amazon.com/dp/B002G9UDYG'

r = requests.get(url, headers=headers)
soup = bs(r.text, 'html.parser')

item = soup.select_one('span[id="fitRecommendationsSection"]').get('data-asin')
print(item)

Result in terminal: