Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

How can I avoid loop break in web scrapping?

import bs4 as bs
import urllib.request
import csv
import requests
import re
from urllib.request import urlopen
from urllib.error import HTTPError
import pandas as pd
import time
import urllib.error

book_ids = ["9781408110416","9789604249671","9781405950305"]

def get_description(book_id): 
    my_urls = 'https://www.bookdepository.com/Enid-Blytons-Christmas-Tales-Enid-Blyton/' + book_id
    source = urlopen(my_urls).read()
    soup = bs.BeautifulSoup(source, 'lxml')
    description = soup.find('div', class_='item-img-content')

    if description:
        return description
    else:
        return "[No description"

for book_id in book_ids: 
    print(book_id)
    print(get_description(book_id))

    time.sleep(2)

The error I get –

HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 404: Not Found

Hello, I am working on script to scrap image urls from bookdepository. My main problem is that certain books return 404 because they are not available on the platform.

How can I keep the script simply go to the next code and not stop the entire loop.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

Thanks in advance.

>Solution :

You can put the code in get_description() function in a try and except block. Something like this,

def get_description(book_id): 
    my_urls = 'https://www.bookdepository.com/Enid-Blytons-Christmas-Tales-Enid-Blyton/' + book_id

    try:
      source = urlopen(my_urls).read()
      soup = bs.BeautifulSoup(source, 'lxml')
      description = soup.find('div', class_='item-img-content')

      if description:
          return description
      else:
          return "[No description"
    except:
      pass

Now if the code encounters an erorr, it’ll just skip that book_id and move to the enxt one.

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading