Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Skip image url when HTTP Error appears in csv file

I have a project trying to imagescrape from a website. I use a csv file with all the urls. Some urls i dont have the premission to open(or they dont exist). I get a Http error 403 in phyton from those. I just want the try the next url in the csv file and ignore the error.

import urllib.request
import csv

with open ('urls_01.csv') as images:
    images = csv.reader(images)
    img_count = 1
    for image in images:
        urllib.request.urlretrieve(image[0],
                'images/image_{0}.jpg'.format(img_count)) 
        img_count += 1

This is the error

Traceback (most recent call last):
  File "c:\Users\Heigre\Documents\Phyton\img_test.py", line 8, in <module>
    urllib.request.urlretrieve(image[0],
  File "C:\Program Files\Python310\lib\urllib\request.py", line 241, in urlretrieve
    with contextlib.closing(urlopen(url, data)) as fp:
  File "C:\Program Files\Python310\lib\urllib\request.py", line 216, in urlopen
    return opener.open(url, data, timeout)
  File "C:\Program Files\Python310\lib\urllib\request.py", line 525, in open
    response = meth(req, response)
  File "C:\Program Files\Python310\lib\urllib\request.py", line 634, in http_response
    response = self.parent.error(
  File "C:\Program Files\Python310\lib\urllib\request.py", line 563, in error
    return self._call_chain(*args)
  File "C:\Program Files\Python310\lib\urllib\request.py", line 496, in _call_chain
    result = func(*args)
  File "C:\Program Files\Python310\lib\urllib\request.py", line 643, in http_error_default
    raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 403: Forbidden

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

>Solution :

Not sure if the import is needed, but I provided it based on your error output. Also provided an idea of a print statement that might help if you need only pass on specific errors…

import urllib.request
from urllib.error import HTTPError
import csv

with open ('urls_01.csv') as images:
    images = csv.reader(images)
    img_count = 1
    for image in images:
        try:
            urllib.request.urlretrieve(
                image[0],
                'images/image_{0}.jpg'.format(img_count)
            ) 
            img_count += 1
        except HTTPError as ex:
            # This will catch and hide any HTTPError the below print not tested...
            # print(ex, ex.code)
            pass
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading