Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Beautifulsoup – ValueError: No objects to concatenate

I’m trying to scraping multi-page amazon comments. My code is not capturing any of the parts I wanted to get.

from bs4 import BeautifulSoup
import requests
import pandas as pd

url = "https://www.amazon.fr/AmazonBasics-600-sacs-d%C3%A9jections-canines-distributeur/product-reviews/B00NABTG60/ref=cm_cr_getr_d_paging_btm_next_"

amazon_reviews = []

for page in range(2, 5):

    req = requests.get(url + str(page) + "?ie=UTF8&reviewerType=all_reviews&pageNumber=" + str(page))
    soup = BeautifulSoup(req.text, "html.parser")

    # Getting desired data from our parsed soup
    reviews = soup.find_all('div', {'data-hook': 'review'})

    for item in reviews:
        client = item.find('a', {'data-hook': 'genome-widget'}).text.strip()
        title = item.find('a', {'data-hook': 'review-title'}).text.strip()
        date = item.find('span', {'data-hook': 'review-date'}).text.strip()
        rating = item.find('i', {'data-hook': 'review-star-rating'}).text.replace('out of 5 stars', '').strip()
        text = item.find('span', {'data-hook': 'review-body'}).text.strip()

        amazon_reviews.append(pd.DataFrame({'title': title, 'date': date, 'text': text, 'rating': rating, 'client': client}, index = [0]))


out = pd.concat(amazon_reviews, ignore_index = True)

My output:

ValueError: No objects to concatenate

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

>Solution :

  1. You have to inject user-agent as headers parameter.
  2. You can’t invoke DataFrame inside for loop
  3. client’s element selection was a bit wrong
  4. I’ve injected the pagination using dot format

Code:

from bs4 import BeautifulSoup
import requests
import pandas as pd

url = "https://www.amazon.fr/AmazonBasics-600-sacs-d%C3%A9jections-canines-distributeur/product-reviews/B00NABTG60/ref=cm_cr_arp_d_paging_btm_next_2?pageNumber={page}"
headers={'user-agent':'Mozilla/5.0'}
amazon_reviews = []

for page in range(1, 5):

    req = requests.get(url.format(page=page),headers=headers)
    soup = BeautifulSoup(req.text, "html.parser")

    # Getting desired data from our parsed soup
    reviews = soup.find_all('div', {'data-hook': 'review'})

    for item in reviews:
        client = item.find('div', {'class': 'a-profile-content'}).get_text(strip=True)
        #print(client)
        title = item.find('a', {'class': 'review-title'}).text.strip()
        #print(title)
        date = item.find('span', {'data-hook': 'review-date'}).text.strip()
        #print(date)
        rating = item.find('i', {'data-hook': 'review-star-rating'}).text.replace('out of 5 stars', '').strip()
        #print(rating)
        text = item.find('span', {'data-hook': 'review-body'}).text.strip()
        #print(text)
        amazon_reviews.append({'title': title, 'date': date, 'text': text, 'rating': rating, 'client': client})


df = pd.DataFrame(amazon_reviews)
print(df)

Output:

   title  ...               client
0                                            Parfaits  ...      Client d'Amazon
1                                  Tellement pratique  ...              Karen M
2                                              Génial  ...   Constance Jourdain
3                                                 Bon  ...           Bernardini
4                                    Très bon produit  ...    Floriane Le Loher
5              Produit simple et facile d'utilisation  ...             K4rm4_Ez
6                                         La solidité  ...              thierry
7                         Sacs à dejection + dévidoir  ...              M&N ABK
8                                         Bon produit  ...  Christophe FRANCOIS
9                                       Bonne qualité  ...       Neuray Gabriel
10                    Très bien pour déjection canine  ...        PELEGRIN ERIC
11                                         Bonne idée  ...               Marine
12                                     Sac de qualité  ...           Jennifer A
13                     conforme et pratique et solide  ...                G pay
14                                            Génial.  ...                Alban
15                                         Impeccable  ...            Marina C.
16                     Pratique aux bonnes dimensions  ...          YVES CALVEZ
17               Solide et taille ok pour un labrador  ...            magnésium
18                                      Très pratique  ...      Client d'Amazon
19                                   très bon article  ...      berger fabienne
20                                           pratique  ...     Laetitia Hermann
21                                      Indispensable  ...                ronin
22                                           Pratique  ...                 SylM
23                                                Top  ...       Emilie Ouviere
24                                      Bonne qualité  ...                Manon
25                                            Parfait  ...              Nicolas
26                                                Top  ...                Simon
27                                 Crochet énervant !  ...             Jabousan
28                               TOUJOURS LE MEILLEUR  ...           FRANKL FAN
29                                   Très bon produit  ...             Ludo96ci
30                                 Top pour le prix !  ...               AlanLB
31                                        Très bien !  ...      Client d'Amazon
32                                             Solide  ...             Lambourg
33  Sacs solides mais très difficiles à détacher l...  ...      Client d'Amazon
34                           Bon rapport qualité prix  ...                GUYET
35                                                Top  ...      Client d'Amazon
36                                   Livraison rapide  ...                 Yann
37                                     Il fait le job  ...                  Rod
38                                        Bon produit  ...              Anais D
39                                           Pratique  ...             mario D.

[40 rows x 5 columns]
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading