Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Python Pandas Web Scraping

I’m trying to turn a list of tables on this page into a Pandas DataFrame:
https://intermediaries.hsbc.co.uk/products/product-finder/

I want to select the customer type box only and select one of the elements (from first to last) and then click find product to display the table for each one before concatenating all the DataFrames into 1 DataFrame.

So far I have managed to select the first element and print the table but I can’t seem to turn it into a pandas DataFrame as I get a value error: Must pass 2-d input. shape=(1, 38, 12)

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

This is my code:

def product_type_button(self):
    select = Select(self.driver.find_element_by_id('Availability'))
    try:
        select.select_by_visible_text('First time buyer')
    except NoSuchElementException:
        print('The item does not exist')
    time.sleep(5)
    
    self.driver.find_element_by_xpath('//button[@type="button" and (contains(text(),"Find product"))]').click()
    time.sleep(5)

def create_dataframe(self):

    data1 = pd.read_html(self.driver.page_source)
    print(data1)

    data2 = pd.DataFrame(data1)
    time.sleep(5)

    data2.to_csv('Data1.csv')

I would like to find a way to print the table for each element, maybe selecting by index instead? and then concatenating into one DataFrame. Any help would be appreciated.

>Solution :

All data for the table is located inside javascript file. You can use re/json to parse it and then construct the dataframe:

import re
import json
import requests
import pandas as pd


js_src = "https://intermediaries.hsbc.co.uk/component---src-pages-products-product-finder-js-9c7004fb8446c3fe0a07.js"
data = requests.get(js_src).text

data = re.search(r"JSON\.parse\('(.*)'\)", data).group(1)
data = json.loads(data)

df = pd.DataFrame(data)
print(df.head().to_markdown(index=False))

df.to_csv("data.csv", index=False)

Prints:

Changed NewProductCode PreviousProductCode ProductType Deal Period (Fixed until) ProductDescription1 ProductTerm Availability Repayment Basis Min LTV % MaxLTV Minimum Loan ? Fees Payable Per Rate Reversionary Rate % APR BookingFee Cashback CashbackValue ERC – Payable Unlimited lump sum payments Premitted (without fees) Unlimited overpayment permitted (without fees) Overpayments ERC Portable Completionfee Free Legals for Remortgage FreeStandardValuation Loading
Continued 4071976 Fixed 31.03.25 Fee Saver* 2 Year Fixed FTB CR and IO 0% 60% 10,000 5,000,000 5.99% 5.04% 5.4 0 No 0 Yes No No Yes Please refer to HSBC policy guide Yes 17 Yes Yes nan
Continued 4071977 Fixed 31.03.25 Fee Saver* 2 Year Fixed FTB CR and IO 0% 70% 10,000 2,000,000 6.04% 5.04% 5.4 0 No 0 Yes No No Yes Please refer to HSBC policy guide Yes 17 Yes Yes nan
Continued 4071978 Fixed 31.03.25 Fee Saver* 2 Year Fixed FTB CR and IO 0% 75% 10,000 2,000,000 6.04% 5.04% 5.4 0 No 0 Yes No No Yes Please refer to HSBC policy guide Yes 17 Yes Yes nan
Continued 4071979 Fixed 31.03.25 Fee Saver* 2 Year Fixed FTB CR 0% 80% 10,000 1,000,000 6.14% 5.04% 5.4 0 No 0 Yes No No Yes Please refer to HSBC policy guide Yes 17 Yes Yes nan
Continued 4071980 Fixed 31.03.25 Fee Saver* 2 Year Fixed FTB CR 0% 85% 10,000 750,000 6.19% 5.04% 5.4 0 No 0 Yes No No Yes Please refer to HSBC policy guide Yes 17 Yes Yes nan

and saves data.csv (screenshot from LibreOffice):

enter image description here

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading