Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

How to change the code to asynchronously iterate links and IDs for scrap web page?

I have the list of links, each link has an id that is in the Id list

How to change the code so that when iterating the link, the corresponding id is substituted into the string:
enter image description here

All code is below:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

import pandas as pd
from bs4 import BeautifulSoup
import requests

HEADERS = {'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) '
                         'Chrome/81.0.4044.138 Safari/537.36 OPR/68.0.3618.125', 'accept': '*/*'}
links = ['https://www..ie', 'https://www..ch', 'https://www..com']
Id = ['164240372761e5178f0488d', '164240372661e5178e1b377', '164240365661e517481a1e6']

def get_html(url, params=None):
    r = requests.get(url, headers=HEADERS, params=params)

def get_data_no_products(html):
    data = []
    soup = BeautifulSoup(html, 'html.parser')
    items = soup.find_all('div', id= '') # How to iteration paste id???????

    for item in items:
        data.append({'pn': item.find('a').get('href')})

    return print(data)

def parse():
    for i in links:
        html = get_html(i)
        get_data_no_products(html.text)
parse()

>Solution :

Parametrise your code:

def get_data_no_products(html, id_):
    data = []
    soup = BeautifulSoup(html, 'html.parser')
    items = soup.find_all('div', id=id_)

And then use zip():

for link, id_ in zip(links, ids):
    get_data_no_producs(link, id_)

Note that there’s a likely bug in your code: you return print(data) which will always be none. You likely just want to return data.

PS

There is another solution to this which you will frequently encounter from people beginning in python:

for i in range(len(links)):
    link = links[i]
    id_ = ids[i]
    ...

This… works. It might even be easier or more natural, if you are coming from e.g. C. (Then again I’d likely use pointers…). Style is very much personal, but if you’re going to write in a high level language like python you might as well avoid thinking about things like ‘the index of the current item’ as much as possible. Just my £0.02.

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading