Goodafternoon, for a university python project I need to estract a table from a website, but the link doesn’t exist, so i need that my cycle ignore that link, and move to the next link. how can I do that?
i’m using the python language to create a dataset of soundtrack.
I used BeautifulSoup to extract the .html, but the link docent exist, so i think about putting a
if type(link)=="NoneType":
but it doesn’t work. link is the result of soup.find that gave me as a result nothing, infant type(link) give me as a result NoneType.
what can i do to recognise the inexistent link?
thank you for the help
>Solution :
You can create a function to test if the URL is valid. If it generates an error, then it will return False, however if is creates a successful connection, it will return True. You can then use this function to filter your list to produce a new list of valid URLS.
Here is an example:
Code:
import requests
url_list = ["http://yahoo.com", "http://a_random_site_that_does_not_exist.com", "http://google.com"]
def is_valid_url(url):
try:
response = requests.get(url)
response.raise_for_status()
return True
except requests.exceptions.RequestException:
return False
valid_url_list = list(filter(is_valid_url, url_list))
print(valid_url_list)
Output:
['http://yahoo.com', 'http://google.com']