Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Pagination with BeautifulSoup in python

I am doing a web scraping project for this site.
https://yellowpages.com.eg/en/search/fast-food
I managed to scrape the data but I am struggling with the pagination
As I want to make a loop that scrapes the next page button and then uses the scraped URL from the next button to do the same process.

url = 'https://yellowpages.com.eg/en/search/fast-food'
while True:
    r =  requests.get(url)
    soup = BeautifulSoup(r.content, 'lxml')
    pages = soup.find_all('ul', class_='pagination center-pagination')
for page in pages:
    nextpage =page.find('li', class_='waves-effect').find('a', {'aria-label' : 'Next'})
if nextpage:
    uu = nextpage.get('href')
    url = 'http://www.yellowpages.com.eg' + str(uu)
    print(url)
else:
    break

This code returns the next URL in the pagination order and then breaks out of loop.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

>Solution :

The problem is that

nextpage =page.find('li', class_='waves-effect').find('a', {'aria-label' : 'Next'})

does return the Next button, but only as long as the Previous button is not there, meaning that it breaks as soon as you leave the first page (it returns None).

Instead, page.find_all('li', class_='waves-effect') returns the Next and the Previous button.

To (maybe) robustly get the Next button, change your line to

nextpage =page.find_all('li', class_='waves-effect')[-1].find('a', {'aria-label' : 'Next'})
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading