Python BeautifulSoup failure to get data from a div with a certain class

Advertisements

I am working on a program that will scrape metacritic for info on the movie from my library and display it but in certain parts like grabbing the rating always returns nothing what am I doing wrong?

from bs4 import BeautifulSoup
import requests
import os

def ratingsGet(headers, movie):
    movie = movie.lower().replace(" ","-")
    detail_link="https://www.metacritic.com/movie/" + movie + "/details"
    detail_page = requests.get(detail_link, headers = headers) 
    soup = BeautifulSoup(detail_page.content, "html.parser")
    #g_data = soup.select('tr.movie_rating td.data span')
    g_data = soup.find_all("div", {"class": "movie_rating"})
    print(g_data)

    if g_data!= []:
        return g_data[0].text
    else:
        return "Failed"

def getMovieInfo():
    headers={'User-Agent': 'Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10_5_0) AppleWebKit/536.1 (KHTML, like Gecko) Chrome/58.0.849.0 Safari/536.1'}
    
    for movie in os.listdir("D:/Movies/"):
        movie = movie.lower().replace(".mp4","")
        print(movie)
        print("Rating: " + ratingsGet(headers,movie))
        print("Home release year: " + rYearGet(headers,movie))
        break

html snippet:

<table class="details" summary="13 Going on 30 Details and Credits">
<tr class="runtime">
<td class="label">Runtime:</td>
<td class="data">98 min</td>
</tr>
<tr class="movie_rating">
<td class="label">Rating:</td>
<td class="data">
                                                                            Rated PG-13 for some sexual content and brief drug references.
                                                                    </td>
</tr>
<tr class="company">
<td class="label">Production:</td>
<td class="data">Revolution Studios</td>
</tr>

>Solution :

As you said, you need to look for a "tr" (not a "div"). I will also append to the answer this.

  • Try to use only find (no need of find all)
  • If the result of find is not None, do another find in it to get only the text, like this:
g_data.find("td", { "class": "data" }).text

The genral code will be something like this:

def ratingsGet(headers, movie):
    movie = movie.lower().replace(" ","-")
    detail_link="https://www.metacritic.com/movie/" + movie + "/details"
    detail_page = requests.get(detail_link, headers = headers)
    soup = BeautifulSoup(detail_page.content, "html.parser")
    g_data = soup.find("tr", {"class": "movie_rating"})

    # Check if that tr exists
    if g_data is not None:
        g_data = g_data.find("td", { "class": "data" })

    # Check if the td inside of it exists
    if g_data is not None:
        return g_data.text.strip()
    return "Failed"

Leave a ReplyCancel reply