I am webscraping and need to parse through a few thousand GET requests at a time. Sometimes these requests fail and I get 429 and/or 403 errors so I need to check if there is data before parsing the response. I wrote this function:
def check_response(response):
if not response or not response.content:
return False
else:
soup = BeautifulSoup(response.content, "html.parser")
if not soup or not soup.find_all(attrs={"class": "stuff"}):
return False
return True
This works, but it can take quite a while to loop through a few thousand responses. Is there a better way?
>Solution :
You can use the response.status_code
attribute to check the status code of the response. You can find a full list of HTTP error codes on MDN, but if it is >= 400, then it’s definitely an error. Try using this code:
def check_response(response):
if not response or not response.content or response.status_code >= 400:
return False
else:
soup = BeautifulSoup(response.content, "html.parser")
if not soup or not soup.find_all(attrs={"class": "stuff"}):
return False
return True
Note that you need to indent your return True
one level inwards, or else it will never be called because of the else-statement.