How to automatically scrape the following CSV

April 24, 2024

On the page above if you click ‘Download CSV’ it will download a CSV file to your computer. I would like to set up a nightly process to download that CSV. I’m happy to scrape the data as well, a CSV just seems easier. I’m not really finding anything. Help?

>Solution :

import requests

def get_daily_stats(url):
    response = requests.get(url, headers={
        'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/124.0.0.0 Safari/537.36',
        'Referer': 'https://baseballsavant.mlb.com/leaderboard/custom?year=2024&type=batter&filter=&min=q&selections=pa%2Ck_percent%2Cbb_percent%2Cwoba%2Cxwoba%2Csweet_spot_percent%2Cbarrel_batted_rate%2Chard_hit_percent%2Cavg_best_speed%2Cavg_hyper_speed%2Cwhiff_percent%2Cswing_percent&chart=false&x=pa&y=pa&r=no&chartType=beeswarm&sort=xwoba&sortDir=desc'
    })
    with open('daily_stats.csv', 'wb') as f:
        f.write(response.content)
    return

def main():
    url = 'https://baseballsavant.mlb.com/leaderboard/custom?year=2024&type=batter&filter=&min=q&selections=pa%2Ck_percent%2Cbb_percent%2Cwoba%2Cxwoba%2Csweet_spot_percent%2Cbarrel_batted_rate%2Chard_hit_percent%2Cavg_best_speed%2Cavg_hyper_speed%2Cwhiff_percent%2Cswing_percent&chart=false&x=pa&y=pa&r=no&chartType=beeswarm&sort=xwoba&sortDir=desc&csv=true'
    get_daily_stats(url)

if __name__ == '__main__':
    main()

This will download the CSV for you and save it to daily_stats.csv in the folder that the script exists in. You’ll have to install requests too – python -m pip install requests. How to do it nightly would be more a matter of what works best for you. I mean, you could just run it every night, or is your goal to have a process on your computer that would auto-run it?

I suppose this will stop working in 2025, but you could just change the year in the URL at that point.