Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Issue with webscraping game data off of Metacritic

I’m using beautiful soup to webscrape game data off of metacritic. I’m trying to get the scores and text for each reviewer. I thought that everything was going well but when I get the response back I see stuff like this:

”’
class="c-siteReviewPlaceholder_header"
”’

the site does not have the word placeholder in its classes. I know that I need to target the specific class:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

”’
class_="c-pageProductReviews_row"
”’

So this is what my code looks like:
”’

import requests
from bs4 import BeautifulSoup

URL = 'https://www.metacritic.com/game/alien-isolation/critic-reviews/? 
platform=playstation-4'
headers = {'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) '\
       'AppleWebKit/537.36 (KHTML, like Gecko) '\
       'Chrome/75.0.3770.80 Safari/537.36'}
 critic_review_page = requests.get(URL, headers=headers)
 soup = BeautifulSoup(critic_review_page.content, "html.parser")
 critic_review_rows = soup.find_all("div", class_="c-pageProductReviews_row")
 print(critic_review_rows)

”’

When I print critic_review_rows I see a lot of the classes have the word placeholder in them. I don’t know if Metacritic won’t let me scrape the site or what is going on. It’s almost as if the data is not loading the data by the time I’m scraping it. Thank you for any help.

>Solution :

Main issue here is that the content is rendered dynamically by javascript, something that requests do not handle, because it is not acting like a browser and only work with the first static state of response.

The initial state is stored in script at the end of the page source, so you could extract it, but a better approach would be to use the api that is called:

import requests 

url = 'https://fandom-prod.apigee.net/v1/xapi/reviews/metacritic/critic/games/alien-isolation/platform/playstation-4/web?apiKey=1MOZgmNFxvmljaQR1X9KAij9Mo4xAY3u&offset=0&limit=50&sort=score&componentType=ReviewList'

requests.get(url).json()

{'data': {'id': 1400262756,
  'totalResults': 50,
  'items': [{'quote': 'The permanent threat of death keeps you forced to the ground – we can’t remember the last game where we willingly snuck around so much – and it feels like the claustrophobic corridors and catwalks of the Sevastopol were built from that angle. Being crouched, looking up at everything… it really does give you that feeling that Creative Assembly wanted it all along – that ‘prey being hunted’ effect. It makes a refreshing change from the feeling of being overpowered and able to kill anything that appears.',
    'score': 90,
    'url': 'http://www.play-mag.co.uk/reviews/ps4-reviews/alien-isolation-review-2/',
    'date': '2014-10-03',
    'author': None,
    'authorSlug': None,
    'image': None,
    'publicationName': 'Play UK',
    'publicationSlug': 'play-uk',
    'reviewedProduct': {'id': 1400262756,
     'type': 'games',
     'title': 'Alien: Isolation',
     'url': '/game/alien-isolation/',
     'criticScoreSummary': {'url': '/game/alien-isolation/critic-reviews/?platform=playstation-4',
      'score': 79},
     'platform': {'id': 1500000006, 'name': 'PlayStation 4'},
     'gameTaxonomy': {'game': {'id': 1400262756, 'name': 'Alien: Isolation'},
      'platform': {'id': 1500000006, 'name': 'PlayStation 4'}}},
    'platform': 'PlayStation 4'},...
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading