Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Amend string / URL and iterate through list

I’m looking to scrape a certain table where the link is amendable with a certain year and week.

URL = https://www.debestseller60.nl/200511#top

I have two lists; one having years ranging between 2005 and now and one having all the week numbers in a year.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

years = list(range(2005, 2023))

weeks = list(range(1, 52))

I tried amending the URL this way but that doesn’t give me the right outcome.

url = "url = "https://www.debestseller60.nl/{years}{weeks}#top"

How would I be able to create a list of all these URLs that I can then use to then scrape through BeautifulSoup?

Thank you for your help!

>Solution :

To generate a list of URLs based on your lists of years and weeks, you can use nested list comprehensions in Python. Here’s how you can do that:

years = list(range(2005, 2023))
weeks = list(range(1, 53)) # Weeks should probably range from 1 to 53, since rarely there can be 53 instead of 52 weeks in a year

urls = [f"https://www.debestseller60.nl/{year}{week:02}#top" for year in years for week in weeks]

This will create a list of URLs with all the possible combinations of years and weeks. Note the :02 in the f-string formatting for the week variable, which ensures that the week number is always two digits long, even for single-digit weeks (e.g., 01, 02, 03, etc.).

Now you can use this list of URLs to scrape data using BeautifulSoup. Here’s a basic example of how to do that:

import requests
from bs4 import BeautifulSoup

for url in urls:
    response = requests.get(url)
    if response.status_code == 200:
        soup = BeautifulSoup(response.content, "html.parser")

        # Your scraping logic here

    else:
        print(f"Failed to fetch data for URL: {url}")

This code will loop through the list of URLs, fetch the content using the requests library, and then create a BeautifulSoup object for each URL if the request is successful. You can then add your scraping logic within the loop to extract the data you need.

Hope this helps!

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading