I’m looking to scrape a certain table where the link is amendable with a certain year and week.
URL = https://www.debestseller60.nl/200511#top
I have two lists; one having years ranging between 2005 and now and one having all the week numbers in a year.
years = list(range(2005, 2023))
weeks = list(range(1, 52))
I tried amending the URL this way but that doesn’t give me the right outcome.
url = "url = "https://www.debestseller60.nl/{years}{weeks}#top"
How would I be able to create a list of all these URLs that I can then use to then scrape through BeautifulSoup?
Thank you for your help!
>Solution :
To generate a list of URLs based on your lists of years and weeks, you can use nested list comprehensions in Python. Here’s how you can do that:
years = list(range(2005, 2023))
weeks = list(range(1, 53)) # Weeks should probably range from 1 to 53, since rarely there can be 53 instead of 52 weeks in a year
urls = [f"https://www.debestseller60.nl/{year}{week:02}#top" for year in years for week in weeks]
This will create a list of URLs with all the possible combinations of years and weeks. Note the :02 in the f-string formatting for the week variable, which ensures that the week number is always two digits long, even for single-digit weeks (e.g., 01, 02, 03, etc.).
Now you can use this list of URLs to scrape data using BeautifulSoup. Here’s a basic example of how to do that:
import requests
from bs4 import BeautifulSoup
for url in urls:
response = requests.get(url)
if response.status_code == 200:
soup = BeautifulSoup(response.content, "html.parser")
# Your scraping logic here
else:
print(f"Failed to fetch data for URL: {url}")
This code will loop through the list of URLs, fetch the content using the requests library, and then create a BeautifulSoup object for each URL if the request is successful. You can then add your scraping logic within the loop to extract the data you need.
Hope this helps!