I am new to web scraping in python, and it seems that some programs work and some don’t randomly. I am trying to request a certain string of text from a page (located here and in the html at:
<body class="fixed-left widescreen">
<div id="wrapper">
<div class="content-page">
<div class="content">
<div class="container">
<div class="row">
<div class="col-lg-3 b-0 p-0">
<div class="col-lg-12">
<div class="card-box m-b-10">
<h4 class="m-t-0 header-title">Status</h4>
<b>Offline</b>
), following this tutorial (not using the tutorial webpage for scraping), but when typing print(results.prettify()), it returns AttributeError: 'NoneType' object has no attribute 'prettify'. So far, my complete code is
from requests import *
from bs4 import BeautifulSoup
URL = 'https://plancke.io/hypixel/player/stats/Captbugz'
page = get(URL)
soup = BeautifulSoup(page.content, 'html.parser')
results = soup.find(id='wrapper')
print(results.prettify())
Following the tutorial, the statement print(results.prettify()) should’ve printed html code (and did while I was following the example), but returned AttributeError: 'NoneType' object has no attribute 'prettify' instead. I have looked up the solution to this and I have found 1) Use Selenium and 2) The data isn’t getting scraped in the first place. For 1), I don’t believe the code I’m looking for would have any problems with Javascript (please correct me on this), and for 2), the get(URL) is actually returning something. I am fairly new to Stack Overflow, so please inform me if I am not following the rules for posting. Thanks!
>Solution :
In this particular case (the URL you mentioned) there is no element with id equal to "wrapper". Thus, soup.find(id='wrapper') can not find anything and returns None as result, which of course has no method prettify. You should check, whether results is not equal to None (i.e. there is an element with the particular id) before calling .prettify().
Since without proper browser headers you end up at Cloudflare, you should adapt your code to the following:
from requests import *
from bs4 import BeautifulSoup
URL = 'https://plancke.io/hypixel/player/stats/Captbugz'
headers = {"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:108.0) Gecko/20100101 Firefox/108.0"}
page = get(URL, headers=headers)
soup = BeautifulSoup(page.content, 'html.parser')
results = soup.find(id='wrapper')
print(results)