Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

How to scrape a page that is dynamicaly locaded?

So here’s my problem. I wrote a program that is perfectly able to get all of the information I want on the first page that I load. But when I click on the nextPage button it runs a script that loads the next bunch of products without actually moving to another page.

So when I run the next loop all that happens is that I get the same content of the first one, even when the ones on the browser I’m emulating itself is different.

This is the code I run:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

from selenium import webdriver 
from selenium.webdriver.common.by import By
from bs4 import BeautifulSoup
import time

driver.get("https://www.my-website.com/search/results-34y1i") 
soup = BeautifulSoup(driver.page_source, 'html.parser')  
time.sleep(2)

#     ///////////       code to find total number of pages
currentPage = 0
button_NextPage = driver.find_element(By.ID, 'nextButton')

while currentPage != totalPages:
#    /////////       code to find the products
    currentPage += 1
    button_NextPage = driver.find_element(By.ID, 'nextButton')
    button_NextPage.click()
    time.sleep(5)

Is there any way for me to scrape exactly what’s loaded on my browser?

>Solution :

The issue it seems to be because you’re just fetching the page 1 as shown in the next line:

driver.get("https://www.tcgplayer.com/search/magic/commander-streets-of-new-capenna?productLineName=magic&setName=commander-streets-of-new-capenna&page=1&view=grid")

But as you can see there’s a query parameter called page in the url that determines which html’s page you are fetching. So what you’ll have to do is every time you’re looping to a new page you’ll have to fetch the new html content with the driver by changing the page query parameter. For example in your loop it will be something like this:

driver.get("https://www.tcgplayer.com/search/magic/commander-streets-of-new-capenna?productLineName=magic&setName=commander-streets-of-new-capenna&page={page}&view=grid".format(page = currentPage))

And after you fetch the new html structure you’ll be able to access to the new elements that are present in the differente pages as you require.

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading