Python – How to scrape Yelp Review using selenium?

December 5, 2021

I am working on a python app that will help me get reviews for a particular restaurant.
I am using Selenium 4.1 web scraper with python.

After I set up Selenium driver in my project folder I put this code together based on the Selenium documentation:

#YELP REVIEW SCRAPER                                 #

#Importing Dependencies
from selenium import webdriver
from selenium.webdriver.chrome.service import Service as ChromeService
from selenium.webdriver.common.by import By
# Setting up driver options
options = webdriver.ChromeOptions()
# Setting up Path to chromedriver executable file
CHROMEDRIVER_PATH ='../Selenium/chromedriver.exe'
# Adding options
options.add_experimental_option("excludeSwitches", ["enable-automation"])
options.add_experimental_option("useAutomationExtension", False)
# Setting up chrome service
service = ChromeService(executable_path=CHROMEDRIVER_PATH)
# Establishing Chrom web driver using set services and options
driver = webdriver.Chrome(service=service, options=options)

driver.get('https://www.yelp.com/biz/taste-of-texas-houston')

This successfully opens up the Yelp page of the restaurant I want to get reviews for, but when i tried to scrape the reviews using:

driver.find_element(By.CLASS_NAME, ' raw__09f24__T4Ezm')

where: ‘ raw__09f24__T4Ezm’ is the name of the span class of the first review, i get the error:

InvalidSelectorException: Message: invalid selector: An invalid or illegal selector was specified
  (Session info: chrome=96.0.4664.45)
Stacktrace:
Backtrace:
    Ordinal0 [0x00BD6903+2517251]
    Ordinal0 [0x00B6F8E1+2095329]
    Ordinal0 [0x00A72848+1058888]
    Ordinal0 [0x00A74F44+1068868]
    Ordinal0 [0x00A74E0E+1068558]
    Ordinal0 [0x00A75070+1069168]
    Ordinal0 [0x00A9D1C2+1233346]
    Ordinal0 [0x00A9D63B+1234491]
    Ordinal0 [0x00AC7812+1406994]
    Ordinal0 [0x00AB650A+1336586]
    Ordinal0 [0x00AC5BBF+1399743]
    Ordinal0 [0x00AB639B+1336219]
    Ordinal0 [0x00A927A7+1189799]
    Ordinal0 [0x00A93609+1193481]
    GetHandleVerifier [0x00D65904+1577972]
    GetHandleVerifier [0x00E10B97+2279047]
    GetHandleVerifier [0x00C66D09+534521]
    GetHandleVerifier [0x00C65DB9+530601]
    Ordinal0 [0x00B74FF9+2117625]
    Ordinal0 [0x00B798A8+2136232]
    Ordinal0 [0x00B799E2+2136546]
    Ordinal0 [0x00B83541+2176321]
    BaseThreadInitThunk [0x757C6739+25]
    RtlGetFullPathName_UEx [0x773B8AFF+1215]
    RtlGetFullPathName_UEx [0x773B8ACD+1165]

I tried researching this error but had no luck.
Any idea how to modify my code so I can get all available reviews for this particular restaurant so I can get the date of review, person, score, and the text of the review?

>Solution :

I don’t personally know how to parse data with selenium as I use Beautifulsoup, here is a example with Beautifulsoup:


from selenium import webdriver
from bs4 import BeautifulSoup
from selenium import webdriver
from selenium.webdriver.chrome.options import Options

options = Options()
options.headless = True
options.add_experimental_option('excludeSwitches', ['enable-logging'])
driver = webdriver.Chrome(options=options)
#driver.get('https://www.nicehash.com/profitability-calculator/nvidia-rtx-3060-ti-lhr')
driver.get('https://www.yelp.com/biz/taste-of-texas-houston')


content = driver.page_source
soup = BeautifulSoup(content, features="lxml")
a = soup.findAll("li", attrs={'class':'margin-b5__09f24__pTvws border-color--default__09f24__NPAKY'})

for i in a:
    print(i.text)

From there you can parse it again looking for the data you need.