How do I efficiently check if data was returned in my GET request?

I am webscraping and need to parse through a few thousand GET requests at a time. Sometimes these requests fail and I get 429 and/or 403 errors so I need to check if there is data before parsing the response. I wrote this function:

def check_response(response):
    if not response or not response.content:
        return False
    else:
        soup = BeautifulSoup(response.content, "html.parser")
        if not soup or not soup.find_all(attrs={"class": "stuff"}):
            return False
    
    return True

This works, but it can take quite a while to loop through a few thousand responses. Is there a better way?

>Solution :

You can use the response.status_code attribute to check the status code of the response. You can find a full list of HTTP error codes on MDN, but if it is >= 400, then it’s definitely an error. Try using this code:

def check_response(response):
    if not response or not response.content or response.status_code >= 400:
        return False
    else:
        soup = BeautifulSoup(response.content, "html.parser")
        if not soup or not soup.find_all(attrs={"class": "stuff"}):
            return False
        return True

Note that you need to indent your return True one level inwards, or else it will never be called because of the else-statement.

Leave a Reply