Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Python BeautifulSoup Span Scraping

I am trying to scrape fields within a Span ID, but the value is not as simple as using find and taking the text from a span.

Below is the HTML from the webpage.
HTML

I am trying to print "B0C4YKLXPQ"

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

This gets me the

Below are all attempts that failed.

- page_soup.find("div", {"id": "twisterContainer"}).find_all("data-asin")

- page_soup.find("div", {"id": "twisterContainer"}).find("span", {"id": "fitRecommendationsSection"}).span["data-asin"]

- page_soup.find("div", {"id": "twisterContainer"}).find("span", {"id": "fitRecommendationsSection"}).find_all("data-asin")

- page_soup.find("div", {"id": "twisterContainer"}).find_all("data-asin")

- page_soup.find("div", {"id": "twisterContainer"}).find_all(["data-asin"])

>Solution :

The following code has good chances of working, unless your IP has been blacklisted by Amazon for some various reasons, like too many scraping attempts:

import requests
from bs4 import BeautifulSoup as bs

headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/114.0.0.0 Safari/537.36'
}

url = 'https://www.amazon.com/dp/B002G9UDYG'

r = requests.get(url, headers=headers)
soup = bs(r.text, 'html.parser')

item = soup.select_one('span[id="fitRecommendationsSection"]').get('data-asin')
print(item)

Result in terminal:

B0C4YKLXPQ

BeautifulSoup documentation can be found here.

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading