Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

BeautifulSoup only outputs data sometimes?

So I’m scraping the link to all the posts on this subreddit (specifically the top posts for the last 24hrs.)
But when I run my program it sometimes outputs all the data, and other times outputs nothing. Same exact code. It works about 1/5 of the time.

# URL of subreddit
test = requests.get('https://www.reddit.com/r/TikTokCringe/top/')
# the html of the request
html = test.text
# making a soup of the html
soup = BeautifulSoup(html, 'html.parser')
# the find_all is finding the first 30 a elements that have a href that starts with '/r/TikTokCringe/comments'
for href in soup.find_all('a', {"href": re.compile('/r/TikTokCringe/comments/*')})[:30]:
    # im looping through every element because I eventually want to get just the links
    # for now im just trying to print every element
    print(href)

>Solution :

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

You’re getting HTTP error 429 – Too many requests. Try to slow down or set User-Agent HTTP header:

headers = {
    "User-Agent": "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:99.0) Gecko/20100101 Firefox/99.0"
}

# URL of subreddit
test = requests.get("https://reddit.com/r/TikTokCringe/top/", headers=headers)

...

Also: consider using their JSON format (add .json at the end of the URL):

data = requests.get(
    "https://reddit.com/r/TikTokCringe/top/.json", headers=headers
).json()

print(data)
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading