Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Scraping the rating of some reviews as pictures

I am trying to scrape the rating of some movie reviews but the thing is that the rating is not a number, but it is composed from 10 images that can be full stars or empty stars.

This is the website from where I want to scrape the data:
https://www.cinemagia.ro/filme/avatar-17818/reviews/?pagina=1&order_direction=DESC

This is my code:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

import requests
from bs4 import BeautifulSoup

url = 'https://www.cinemagia.ro/filme/avatar-17818/reviews/?pagina=1&order_direction=DESC'
page = requests.get(url)

soup = BeautifulSoup(page.content, "html.parser")

rating=0
scraped_ratings = soup.find_all('span', class_='stelutze').find=("img")
for i in scraped_ratings:
    if "star_full.gif" in i.get("src"):
        rating += 1
print(rating)

Somebody helped me with this code but it only gives the rating of the first review.

rating=0
rawRating = soup.find("span", {"class": "stelutze"}).find_all("img")
for i in rawRating:
    if "star_full.gif" in i.get("src"):
        rating += 1
print(rating)

I tried to change the code to this:

rating=0
count=0
rawRating = soup.find_all("span", {"class": "stelutze"}).find_all("img")
for i in rawRating:
    if "star_full.gif" in i.get("src"):
        rating += 1
    count+= 1
    if count == 10:
        print(rating)
        rating=0
        count=0

But I get this error:
AttributeError: ResultSet object has no attribute ‘find_all’. You’re probably treating a list of elements like a single element. Did you call find_all() when you meant to call find()?

I think this is because I can’t use two find_all in the same statement.

Any help?

>Solution :

I believe that this should solve your issue, I have not tested this but I don’t see why it shouldn’t work.

Basically, when you do find_all you get a list back of all the elements it finds. So what it is doing is it first gets every review on the page and then you iterate over each review and get all the images for each review like you did before.

rating=0
count=0
rawRatings = soup.find_all("span", {"class": "stelutze"})
for i in rawRatings:
    rawRating = i.find_all("img")
    for j in rawRating:
        if "star_full.gif" in j.get("src"):
            rating += 1
        count += 1
        if count == 10:
            print(rating)
            rating = 0
            count = 0

If you have any questions let me know

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading