Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Python BeautifulSoup failure to get data from a div with a certain class

I am working on a program that will scrape metacritic for info on the movie from my library and display it but in certain parts like grabbing the rating always returns nothing what am I doing wrong?

from bs4 import BeautifulSoup
import requests
import os

def ratingsGet(headers, movie):
    movie = movie.lower().replace(" ","-")
    detail_link="https://www.metacritic.com/movie/" + movie + "/details"
    detail_page = requests.get(detail_link, headers = headers) 
    soup = BeautifulSoup(detail_page.content, "html.parser")
    #g_data = soup.select('tr.movie_rating td.data span')
    g_data = soup.find_all("div", {"class": "movie_rating"})
    print(g_data)

    if g_data!= []:
        return g_data[0].text
    else:
        return "Failed"

def getMovieInfo():
    headers={'User-Agent': 'Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10_5_0) AppleWebKit/536.1 (KHTML, like Gecko) Chrome/58.0.849.0 Safari/536.1'}
    
    for movie in os.listdir("D:/Movies/"):
        movie = movie.lower().replace(".mp4","")
        print(movie)
        print("Rating: " + ratingsGet(headers,movie))
        print("Home release year: " + rYearGet(headers,movie))
        break

html snippet:

<table class="details" summary="13 Going on 30 Details and Credits">
<tr class="runtime">
<td class="label">Runtime:</td>
<td class="data">98 min</td>
</tr>
<tr class="movie_rating">
<td class="label">Rating:</td>
<td class="data">
                                                                            Rated PG-13 for some sexual content and brief drug references.
                                                                    </td>
</tr>
<tr class="company">
<td class="label">Production:</td>
<td class="data">Revolution Studios</td>
</tr>

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

>Solution :

As you said, you need to look for a "tr" (not a "div"). I will also append to the answer this.

  • Try to use only find (no need of find all)
  • If the result of find is not None, do another find in it to get only the text, like this:
g_data.find("td", { "class": "data" }).text

The genral code will be something like this:

def ratingsGet(headers, movie):
    movie = movie.lower().replace(" ","-")
    detail_link="https://www.metacritic.com/movie/" + movie + "/details"
    detail_page = requests.get(detail_link, headers = headers)
    soup = BeautifulSoup(detail_page.content, "html.parser")
    g_data = soup.find("tr", {"class": "movie_rating"})

    # Check if that tr exists
    if g_data is not None:
        g_data = g_data.find("td", { "class": "data" })

    # Check if the td inside of it exists
    if g_data is not None:
        return g_data.text.strip()
    return "Failed"
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading