I am trying to scrape fields within a Span ID, but the value is not as simple as using find and taking the text from a span.
Below is the HTML from the webpage.
HTML
I am trying to print "B0C4YKLXPQ"
This gets me the
Below are all attempts that failed.
- page_soup.find("div", {"id": "twisterContainer"}).find_all("data-asin")
- page_soup.find("div", {"id": "twisterContainer"}).find("span", {"id": "fitRecommendationsSection"}).span["data-asin"]
- page_soup.find("div", {"id": "twisterContainer"}).find("span", {"id": "fitRecommendationsSection"}).find_all("data-asin")
- page_soup.find("div", {"id": "twisterContainer"}).find_all("data-asin")
- page_soup.find("div", {"id": "twisterContainer"}).find_all(["data-asin"])
>Solution :
The following code has good chances of working, unless your IP has been blacklisted by Amazon for some various reasons, like too many scraping attempts:
import requests
from bs4 import BeautifulSoup as bs
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/114.0.0.0 Safari/537.36'
}
url = 'https://www.amazon.com/dp/B002G9UDYG'
r = requests.get(url, headers=headers)
soup = bs(r.text, 'html.parser')
item = soup.select_one('span[id="fitRecommendationsSection"]').get('data-asin')
print(item)
Result in terminal:
B0C4YKLXPQ
BeautifulSoup documentation can be found here.