How do I check if www.google.com is reachable?

I live in China, behind the infamous Great Firewall of China, and I use VPNs.

A simple observation is that while the VPN is connected, I can access www.google.com. And if the VPN isn’t connected, then I cannot access Google. So I can check if I have an active VPN connection by accessing Google.

My ISP really loves to disconnect my VPN, and so I have routinely check if I have an active VPN connection, and I have already found a way to programmatically do this.

I am connected to the VPN right now, and if I do the following:

import requests

google = requests.get('https://www.google.com', timeout=3)
print(google.status_code == 200)

Everything is fine.

But if I don’t have an active VPN connection, then all hell breaks loose.

I do this check precisely because my connection will be disconnected, and I need a function to return a False when it happens, but requests really loves to throw exceptions, it stops the execution of my script, and the exceptions come one after another:

...
ReadTimeoutError: HTTPSConnectionPool(host='www.google.com', port=443): Read timed out. (read timeout=3)

During handling of the above exception, another exception occurred:

ReadTimeout                               Traceback (most recent call last)
...

I have imported a bunch of exceptions just so requests doesn’t panic and stop my script when VPN is disconnected:

import requests
from requests.exceptions import ConnectionError, ConnectTimeout, ReadTimeout, Timeout
from socket import gaierror
from requests.packages.urllib3.exceptions import MaxRetryError, NewConnectionError, ReadTimeoutError

def google_accessible():
    try:
        google = requests.get('https://www.google.com', timeout=3)
        if google.status_code == 200:
            return True
    except (ConnectionError, ConnectTimeout, gaierror, MaxRetryError, NewConnectionError, ReadTimeout, ReadTimeoutError, TimeoutError):
        pass
    return False

I thought I caught all exceptions previously, but that isn’t the case because I failed to catch the above exceptions (ReadTimeout, ReadTimeoutError, TimeoutError).

I know I can use except Exception to catch them all, but that would catch exceptions that aren’t intended to be caught, and I would rather let those exceptions stop the execution than risking bugs.

How do I use minimal number of exceptions to catch all exceptions that are VERY likely to occur when a request failed?

>Solution :

I think it would be better to use RequestException from requests.exception module. The hierarchy is the following:

builtins.OSError(builtins.Exception)
    RequestException  # <- Use this top level exception
        ChunkedEncodingError
        ConnectionError
            ConnectTimeout(ConnectionError, Timeout)
            ProxyError
            SSLError
        ContentDecodingError(RequestException, urllib3.exceptions.HTTPError)
        HTTPError
        InvalidHeader(RequestException, builtins.ValueError)
        InvalidJSONError
            JSONDecodeError(InvalidJSONError, json.decoder.JSONDecodeError)
        InvalidSchema(RequestException, builtins.ValueError)
        InvalidURL(RequestException, builtins.ValueError)
            InvalidProxyURL
        MissingSchema(RequestException, builtins.ValueError)
        RetryError
        StreamConsumedError(RequestException, builtins.TypeError)
        Timeout
            ReadTimeout
        TooManyRedirects
        URLRequired
        UnrewindableBodyError
builtins.Warning(builtins.Exception)
    RequestsWarning
        FileModeWarning(RequestsWarning, builtins.DeprecationWarning)
        RequestsDependencyWarning

So you can do:

from requests.exception import RequestException

def google_accessible():
    try:
        google = requests.get('https://www.google.com', timeout=3)
        if google.status_code == 200:
            return True
    except RequestException:
        pass
    return False

Leave a Reply