I am able to scrape individual fields off a website, but would like to map the title to the time.
The fields "have their own class, so I am struggling on how to map the time to the title.
A dictionary would work, but how would i structure/format this dictionary so that it stores values on a line by line basis?
url for reference – https://ash.confex.com/ash/2021/webprogram/STUDIO.html
9:00 AM-9:30 AM, Defining Race, Ethnicity, and Genetic Ancestry
11:00 AM-11:30 AM, Definitions of Structural Racism
from bs4 import BeautifulSoup from selenium import webdriver driver = webdriver.Chrome() import time driver.get('https://ash.confex.com/ash/2021/webprogram/STUDIO.html') time.sleep(3) page_source = driver.page_source soup=BeautifulSoup(page_source,'html.parser') productlist=soup.find_all('div',class_='itemtitle') for item in productlist: for eachLine in item.find_all('a',href=True): title=eachLine.text print(title) times=driver.find_elements_by_class_name("time") for t in times: print(t.text)
Selenium is an overkill here. Website didn’t use any dynamic content, so you can scrape it with Python
BeautifulSoup. Here is a code how to achieve it. You need to query
times separately and then iterate using indexes to be able to get both items at once. I put in
range() length of an
productlist because I assuming that both
times will have equal length.
import requests from bs4 import BeautifulSoup url = 'https://ash.confex.com/ash/2021/webprogram/STUDIO.html' res = requests.get(url) soup = BeautifulSoup(res.content,'html.parser') productlist = soup.select('div.itemtitle > a') times = soup.select('.time') for iterator in range(len(productlist)): row = times[iterator].text + ", " + productlist[iterator].text print(row)
soup.select() gather items by css.