Amend string / URL and iterate through list

byMR

March 26, 2023

I’m looking to scrape a certain table where the link is amendable with a certain year and week.

URL = https://www.debestseller60.nl/200511#top

I have two lists; one having years ranging between 2005 and now and one having all the week numbers in a year.

years = list(range(2005, 2023))

weeks = list(range(1, 52))

I tried amending the URL this way but that doesn’t give me the right outcome.

url = "url = "https://www.debestseller60.nl/{years}{weeks}#top"

How would I be able to create a list of all these URLs that I can then use to then scrape through BeautifulSoup?

Thank you for your help!

>Solution :

To generate a list of URLs based on your lists of years and weeks, you can use nested list comprehensions in Python. Here’s how you can do that:

years = list(range(2005, 2023))
weeks = list(range(1, 53)) # Weeks should probably range from 1 to 53, since rarely there can be 53 instead of 52 weeks in a year

urls = [f"https://www.debestseller60.nl/{year}{week:02}#top" for year in years for week in weeks]

This will create a list of URLs with all the possible combinations of years and weeks. Note the :02 in the f-string formatting for the week variable, which ensures that the week number is always two digits long, even for single-digit weeks (e.g., 01, 02, 03, etc.).

Now you can use this list of URLs to scrape data using BeautifulSoup. Here’s a basic example of how to do that:

import requests
from bs4 import BeautifulSoup

for url in urls:
    response = requests.get(url)
    if response.status_code == 200:
        soup = BeautifulSoup(response.content, "html.parser")

        # Your scraping logic here

    else:
        print(f"Failed to fetch data for URL: {url}")

This code will loop through the list of URLs, fetch the content using the requests library, and then create a BeautifulSoup object for each URL if the request is successful. You can then add your scraping logic within the loop to extract the data you need.

Hope this helps!