Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Python – How to scrape Yelp Review using selenium?

I am working on a python app that will help me get reviews for a particular restaurant.
I am using Selenium 4.1 web scraper with python.

After I set up Selenium driver in my project folder I put this code together based on the Selenium documentation:

#YELP REVIEW SCRAPER                                 #

#Importing Dependencies
from selenium import webdriver
from selenium.webdriver.chrome.service import Service as ChromeService
from selenium.webdriver.common.by import By
# Setting up driver options
options = webdriver.ChromeOptions()
# Setting up Path to chromedriver executable file
CHROMEDRIVER_PATH ='../Selenium/chromedriver.exe'
# Adding options
options.add_experimental_option("excludeSwitches", ["enable-automation"])
options.add_experimental_option("useAutomationExtension", False)
# Setting up chrome service
service = ChromeService(executable_path=CHROMEDRIVER_PATH)
# Establishing Chrom web driver using set services and options
driver = webdriver.Chrome(service=service, options=options)

driver.get('https://www.yelp.com/biz/taste-of-texas-houston')

This successfully opens up the Yelp page of the restaurant I want to get reviews for, but when i tried to scrape the reviews using:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

driver.find_element(By.CLASS_NAME, ' raw__09f24__T4Ezm')

where: ‘ raw__09f24__T4Ezm’ is the name of the span class of the first review, i get the error:

InvalidSelectorException: Message: invalid selector: An invalid or illegal selector was specified
  (Session info: chrome=96.0.4664.45)
Stacktrace:
Backtrace:
    Ordinal0 [0x00BD6903+2517251]
    Ordinal0 [0x00B6F8E1+2095329]
    Ordinal0 [0x00A72848+1058888]
    Ordinal0 [0x00A74F44+1068868]
    Ordinal0 [0x00A74E0E+1068558]
    Ordinal0 [0x00A75070+1069168]
    Ordinal0 [0x00A9D1C2+1233346]
    Ordinal0 [0x00A9D63B+1234491]
    Ordinal0 [0x00AC7812+1406994]
    Ordinal0 [0x00AB650A+1336586]
    Ordinal0 [0x00AC5BBF+1399743]
    Ordinal0 [0x00AB639B+1336219]
    Ordinal0 [0x00A927A7+1189799]
    Ordinal0 [0x00A93609+1193481]
    GetHandleVerifier [0x00D65904+1577972]
    GetHandleVerifier [0x00E10B97+2279047]
    GetHandleVerifier [0x00C66D09+534521]
    GetHandleVerifier [0x00C65DB9+530601]
    Ordinal0 [0x00B74FF9+2117625]
    Ordinal0 [0x00B798A8+2136232]
    Ordinal0 [0x00B799E2+2136546]
    Ordinal0 [0x00B83541+2176321]
    BaseThreadInitThunk [0x757C6739+25]
    RtlGetFullPathName_UEx [0x773B8AFF+1215]
    RtlGetFullPathName_UEx [0x773B8ACD+1165]

I tried researching this error but had no luck.
Any idea how to modify my code so I can get all available reviews for this particular restaurant so I can get the date of review, person, score, and the text of the review?

>Solution :

I don’t personally know how to parse data with selenium as I use Beautifulsoup, here is a example with Beautifulsoup:


from selenium import webdriver
from bs4 import BeautifulSoup
from selenium import webdriver
from selenium.webdriver.chrome.options import Options

options = Options()
options.headless = True
options.add_experimental_option('excludeSwitches', ['enable-logging'])
driver = webdriver.Chrome(options=options)
#driver.get('https://www.nicehash.com/profitability-calculator/nvidia-rtx-3060-ti-lhr')
driver.get('https://www.yelp.com/biz/taste-of-texas-houston')


content = driver.page_source
soup = BeautifulSoup(content, features="lxml")
a = soup.findAll("li", attrs={'class':'margin-b5__09f24__pTvws border-color--default__09f24__NPAKY'})

for i in a:
    print(i.text)


From there you can parse it again looking for the data you need.

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading