Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

using beautiful soup to get consolidated data from a list of urls instead of just the first url

I’m trying get the data of three states, based on the same url format.

states = ['123', '124', '125']

urls = []
for state in states:
    url = f'www.something.com/geo={state}'
    urls.append(url)

and from there I have three separate urls, each containing different state ID.

However when I get to processing it via BS, the output only showed data from the state 123.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

for url in urls:
    client = ScrapingBeeClient(api_key="API_KEY")
    response = client.get(url)
    doc = BeautifulSoup(response.text, 'html.parser')

subsequently I extracted the columns I wanted using this:

listings = doc.select('.is-9-desktop')

rows = []

for listing in listings:
    row = {}
    try:
        row['name'] = listing.select_one('.result-title').text.strip()
    except:
        print("no name")
    try:
        row['add'] = listing.select_one('.address-text').text.strip()
    except:
        print("no add")
    try:
        row['mention'] = listing.select_one('.review-mention-block').text.strip()
    except:
        pass
    
    rows.append(row)

But as mentioned it only showed data for state 123. Hugely appreciate it if anyone could let me know where I went wrong, thank you!

EDIT

I added the URL output into a list, and was able to get the data for all three states.

doc = []
for url in urls:
    client = ScrapingBeeClient(api_key="API_KEY")
    response = client.get(url)
    docs = BeautifulSoup(response.text, 'html.parser')
    doc.append(docs)

However when I ran it through BS it resulted in the error message:

Attribute Error: ‘list’ object has no attribute select.

Do I run it through another loop?

>Solution :

It do not need all of these loops – Just iterate over the states and get the listings to append to rows.

Most importend thing is, that rows=[] is placed outside the for loops to stop overwriting itsself.

Example

states = ['123', '124', '125']

rows = []

for state in states:
    url = f'www.something.com/geo={states}'
    client = ScrapingBeeClient(api_key="API_KEY")
    response = client.get(url)
    doc = BeautifulSoup(response.text, 'html.parser')

    listings = doc.select('.is-9-desktop')

    for listing in listings:
        row = {}
        try:
            row['name'] = listing.select_one('.result-title').text.strip()
        except:
            print("no name")
        try:
            row['add'] = listing.select_one('.address-text').text.strip()
        except:
            print("no add")
        try:
            row['mention'] = listing.select_one('.review-mention-block').text.strip()
        except:
            pass

        rows.append(row)
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading