I am trying to learn Scrapping , one problem I am facing is identifying correct class names , is there any particular rule/method to follow for identifying correct class names For Example in the code below I am trying to get Questions lists from stackoverflow page , for that I am clicking on inspect on the first question & i can see classname as question-hyperlink but when I try in the code below I get empty results , similarly if iI am trying with divname summary I get same empty results kindly guide on how can I fix this & avoid in future cases
import requests
from bs4 import BeautifulSoup
website = 'https://stackoverflow.com/'
r = requests.get(website)
if r.status_code == 200:
print(f"Connected to {website}")
soup = BeautifulSoup(r.content, 'html.parser')
s = soup.find_all(class_name='question-hyperlink')
print(s)
else:
print(r)
print("Done")
>Solution :
The url you’re using doesn’t have any questions – https://stackoverflow.com shows only a starting page, unless you’re logged in.
You need to change the url to https://stackoverflow.com/questions.
Also, you should be using class_=, not class_name= in find_all().
Then it works just fine.
import requests
from bs4 import BeautifulSoup
website = 'https://stackoverflow.com/questions/'
headers = {
"User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:95.0) Gecko/20100101 Firefox/95.0",
}
r = requests.get(website, headers=headers)
if r.status_code == 200:
print(f"Connected to {website}")
soup = BeautifulSoup(r.text, 'html.parser').find_all("a", class_="question-hyperlink")
print(len(soup))
Output:
20