Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

WebScrapping with Selenium and BeaufitulSoup can't find anything

I am trying to extract all the description in the links in the class="publication u-padding-xs-ver js-publication" of this website: https://www.sciencedirect.com/browse/journals-and-books?accessType=openAccess&accessType=containsOpenAccess

I tried both with BeautifulSoup and Selenium but I can’t extract anything. You can see in the image below the result I got
result

Here is the code I am using

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

options = Options()
options.add_argument("headless")
driver = webdriver.Chrome(service=Service(ChromeDriverManager().install()), options=options)
ul = driver.find_element(By.ID, "publication-list")
print("Links")
allLi = ul.find_elements(By.TAG_NAME, "li")
for li in allLi:
    print("Links " + str(count) + " " + li.text)

>Solution :

You are missing waits.
You have to wait for elements to become visible before accessing them.
The best approach to do that is with use of WebDriverWait expected_conditions explicit waits.
The following code works

from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC

options = Options()
options.add_argument("start-maximized")

webdriver_service = Service('C:\webdrivers\chromedriver.exe')
driver = webdriver.Chrome(options=options, service=webdriver_service)
wait = WebDriverWait(driver, 20)

url = "https://www.sciencedirect.com/browse/journals-and-books?accessType=openAccess&accessType=containsOpenAccess"
driver.get(url)
ul = wait.until(EC.visibility_of_element_located((By.ID, "publication-list")))
allLi = wait.until(EC.presence_of_all_elements_located((By.TAG_NAME, "li")))
print(len(allLi))

the output is:

167
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading