Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

How to web scrape through AJAX dynamic loaded websites without encountering 500 Internal Server Error

I’ve been assigned the task of scraping Nvidia’s (https://www.nvidia.com/gtc/session-catalog/) sessions catalogue. This site uses AJAX dynamic loading. I want to be able to get all the data of each catalogue.
Using my very basic approach of scraping via particular element tags, then pressing "Show More" button and repeating the process for next loaded elements, I’ve been able to scrape till around 520-530 sessions. However, after that it repeatedly starts throwing the error XHR 500 Internal Server Error. I’ve done it with and without headless browser (puppeteer).
Why is this happening and how to overcome this? You guys can check out the elements/tags on the website for better understanding.
An answer without using a headless browser, if possible, would be amazing.

I have tried the basic "element tag based" scraping and the same with puppeteer but both aren’t working after 520-540.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

>Solution :

Why is this happening

It is apparently a bug in their server side code. Their api apparently does not work when trying to load more than 500 events. Note that the fact that the 500 Internal Server Error happens at 500 events is just a coincidence. Status code 500 is a standard http error code for "something went wrong", and not a reference to the number of events.

If you’re curious to see what calls are being made to the server, you can open the developer tools in your browser and go to the network tab. The api call for loading the next page is https://events.rainfocus.com/api/search. It succeeds initially, then fails once the from parameter reaches 500.

how to overcome this

Unless you know another source of data, there is no overcoming this. Their server will not send more than 500 events to the front end. You could maybe find out if they have some bug reporting process for their website, but that’s a long shot.

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading