How to scrape data from a line chart on bltindex.com?

May 22, 2024

I want to scrape the data from the only line chart on https://www.bltindex.com/
The goal is to in the end have a pandas DataFrame with one time series from the chart in it

After watching this video I tried to apply the same method and look for some csv or json file in the Network of the page while the page was loading, but could not find any. The only thing I found was a css file that had the word "chart" in it with a link https://docs.google.com/static/spreadsheets2/client/css/838001818-v3-ritz_chart_css_ltr.css and saw that it had a request link as well (it is in the code below)
I tried the following code:

import requests
from bs4 import BeautifulSoup

url = 'https://docs.google.com/spreadsheets/d/e/2PACX-1vQG9TYlv8_LpCvO7EI3Y3s8MoxQEfOHTd3-EqccN5PoeHcdxraxZC0y8UWFx_2NnogVIIuk1i-phvFe/pubchart?oid=813038046&format=interactive'
html = requests.get(url)

soup = BeautifulSoup(html.content)
print(soup.prettify())

The code returned a string and in the <script nonce="yyTSUqBQUPTxI-ZkIM7OKw"> I indeed saw the values that I want to get. However, I do not know how to get them from this string without doing it manually. Is there perhaps some more convenient way to get the data?

>Solution :

Try:

import json
import re

import pandas as pd
import requests
from bs4 import BeautifulSoup

url = "https://docs.google.com/spreadsheets/d/e/2PACX-1vQG9TYlv8_LpCvO7EI3Y3s8MoxQEfOHTd3-EqccN5PoeHcdxraxZC0y8UWFx_2NnogVIIuk1i-phvFe/pubchart?oid=813038046&format=interactive"
html_text = requests.get(url).text


data = re.search(r"'chartJson': '(.*?)',", html_text).group(1)
data = re.sub(r"\\x(..)", lambda g: chr(int(g.group(1), 16)), data)
data = json.loads(data)

# print(json.dumps(data, indent=4))

df = pd.DataFrame(
    [(r["c"][0]["f"], r["c"][1]["f"]) for r in data["dataTable"]["rows"]],
    columns=["Date", "Value"],
)
print(df)

Prints:

            Date       Value
0    07-Jan-2018           1
1    14-Jan-2018   1.0396913
2    21-Jan-2018  0.84593582
3    28-Jan-2018  0.78201258
4    04-Feb-2018  0.71397352
5    11-Feb-2018   0.8097111
6    18-Feb-2018   1.2938001
7    25-Feb-2018  0.95799756
8    04-Mar-2018  0.81667918

...