Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

How store values together after scrape

I am able to scrape individual fields off a website, but would like to map the title to the time.

The fields "have their own class, so I am struggling on how to map the time to the title.

A dictionary would work, but how would i structure/format this dictionary so that it stores values on a line by line basis?

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

url for reference – https://ash.confex.com/ash/2021/webprogram/STUDIO.html

expected output:

9:00 AM-9:30 AM, Defining Race, Ethnicity, and Genetic Ancestry

11:00 AM-11:30 AM, Definitions of Structural Racism

etc

from bs4 import BeautifulSoup
from selenium import webdriver
driver = webdriver.Chrome()
import time


driver.get('https://ash.confex.com/ash/2021/webprogram/STUDIO.html')
time.sleep(3)
page_source = driver.page_source
soup=BeautifulSoup(page_source,'html.parser')


productlist=soup.find_all('div',class_='itemtitle')
for item in productlist:
    for eachLine in item.find_all('a',href=True):
        title=eachLine.text
        print(title)
times=driver.find_elements_by_class_name("time")
for t in times:
    print(t.text)

>Solution :

Selenium is an overkill here. Website didn’t use any dynamic content, so you can scrape it with Python requests and BeautifulSoup. Here is a code how to achieve it. You need to query productlist and times separately and then iterate using indexes to be able to get both items at once. I put in range() length of an productlist because I assuming that both productlist and times will have equal length.

import requests
from bs4 import BeautifulSoup

url = 'https://ash.confex.com/ash/2021/webprogram/STUDIO.html'

res = requests.get(url)
soup = BeautifulSoup(res.content,'html.parser')

productlist = soup.select('div.itemtitle > a')
times = soup.select('.time')

for iterator in range(len(productlist)):
    row = times[iterator].text + ", " + productlist[iterator].text
    print(row)

Note: soup.select() gather items by css.

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading