Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

How to identify the correct class for Beautifulsoup?

I am trying to learn Scrapping , one problem I am facing is identifying correct class names , is there any particular rule/method to follow for identifying correct class names For Example in the code below I am trying to get Questions lists from stackoverflow page , for that I am clicking on inspect on the first question & i can see classname as question-hyperlink but when I try in the code below I get empty results , similarly if iI am trying with divname summary I get same empty results kindly guide on how can I fix this & avoid in future cases

import requests
from bs4 import BeautifulSoup
 
website = 'https://stackoverflow.com/'
r = requests.get(website)

if r.status_code == 200:
    print(f"Connected to {website}")
    soup = BeautifulSoup(r.content, 'html.parser')
    s = soup.find_all(class_name='question-hyperlink')
    print(s)
else:
    print(r)
    
print("Done")

>Solution :

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

The url you’re using doesn’t have any questions – https://stackoverflow.com shows only a starting page, unless you’re logged in.

You need to change the url to https://stackoverflow.com/questions.

Also, you should be using class_=, not class_name= in find_all().

Then it works just fine.

import requests
from bs4 import BeautifulSoup

website = 'https://stackoverflow.com/questions/'

headers = {
    "User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:95.0) Gecko/20100101 Firefox/95.0",
}
r = requests.get(website, headers=headers)

if r.status_code == 200:
    print(f"Connected to {website}")
    soup = BeautifulSoup(r.text, 'html.parser').find_all("a", class_="question-hyperlink")
    print(len(soup))

Output:

20
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading