Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Scraping Table Data from Multiple URLS, but first link is repeating

I’m looking to iterate through the URL with "count" as variables between 1 and 65.

Right now, I’m close but really struggling to figure out the last piece. I’m receiving the same table (from variable 1) 65 times, instead of receiving the different tables.

import requests
import pandas as pd

url = 'https://basketball.realgm.com/international/stats/2023/Averages/Qualified/All/player/All/desc/{count}'

res = []

for count in range(1, 65):
    html = requests.get(url).content
    df_list = pd.read_html(html)
    df = df_list[-1]
    res.append(df)

    print(res)
df.to_csv('my data.csv')

Any thoughts?

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

>Solution :

A few errors:

  • Your URL was templated incorrectly. It remains at .../{count} literally, without substituting or updating from the loop variable.
  • If you want to get page 1 to 65, use range(1, 66)
  • Unless you want to export only the last dataframe, you need to concatenate all of them first
# No count here, we will add it later
url = 'https://basketball.realgm.com/international/stats/2023/Averages/Qualified/All/player/All/desc'
res = []

for count in range(1, 66):
    # pd.read_html accepts a URL too so no need to make a separate request
    df_list = pd.read_html(f"{url}/{count}")
    res.append(df_list[-1])

pd.concat(res).to_csv('my data.csv')
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading