Content of Page is not found in beautiful soup soup

October 14, 2023

I am trying to get access to the "historical quotes". I want to save the date and the close prices in a dataframe. I use bs4 to scrape the page, but I can’t find the data in the page_soup.

import requests
from bs4 import BeautifulSoup

url = "https://www.marketscreener.com/quote/index/MSCI-WORLD-SIZE-TILT-NETR-121861272/quotes/"
headers = {
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/99.0.1234.567 Safari/537.36"
}

response = requests.get(url, headers=headers)

page_content = response.text
page_soup = BeautifulSoup(page_content, "html.parser")

>Solution :

@tripleee is right. The data you’re looking for is not there because it is not returned as HTML by the server. You could write a scraping program to get it, but there is an easier solution.

After a look on the source code of the page, the desired table is loaded dynamically with JS (it has attributes id="historical-quotes" and data-fct-name="historicalDataTable"). So with a quick view on the page requests with Firefox, I found the url called to load that data, which is https://www.marketscreener.com/api/quotes/historical/121861272. It returns data like this, which corresponds to the five columns of the table you’re looking for.

{
    "error": false,
    "data": [
        ["2023-10-12","7626.0930","7626.0930","7626.0930","7626.0930","0"],
        ["2023-10-11","7680.4340","7680.4340","7680.4340","7680.4340","0"],
        ...
    ]
}

So you can just call that URL (assuming it doesn’t change regularly, which is likely if the site wants to avoid this kind of scraping) and process the JSON like the JS of this page does.