How i identify a list of website that doesn't exist, from a bunch of website?


Goodafternoon, for a university python project I need to estract a table from a website, but the link doesn’t exist, so i need that my cycle ignore that link, and move to the next link. how can I do that?

i’m using the python language to create a dataset of soundtrack.
I used BeautifulSoup to extract the .html, but the link docent exist, so i think about putting a

if type(link)=="NoneType":

but it doesn’t work. link is the result of soup.find that gave me as a result nothing, infant type(link) give me as a result NoneType.
what can i do to recognise the inexistent link?
thank you for the help

>Solution :

You can create a function to test if the URL is valid. If it generates an error, then it will return False, however if is creates a successful connection, it will return True. You can then use this function to filter your list to produce a new list of valid URLS.

Here is an example:


import requests

url_list = ["", "", ""]

def is_valid_url(url):
        response = requests.get(url)
        return True
    except requests.exceptions.RequestException:
        return False

valid_url_list = list(filter(is_valid_url, url_list))


['', '']

Leave a Reply Cancel reply