Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Webscraping Python Website Using JSON Application

I am trying to get the price of one item on the website in the url below. However, I am finding some issues when looking at the source page of the website.

The url is: https://www.cartier.com/en-gb/love-bracelet-small-model_cod25372685655708131.html#dept=EU_Love

The part of the source page I am interested in is the following (I guess):

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

<script type="application/ld+json">
    [{

"@context":"http://schema.org",
"@type":"Product",
"productID":"25372685655708131",
"name":"LOVE bracelet, small model",
"description":"#LOVE# bracelet, small model, yellow gold 750/1000. Supplied with a screwdriver. Width: 3.65 mm (for size 17). Now available in a slimmer version, Cartier continues to write the story of the #LOVE# bracelet. Same design, same oval shape, same story: a timeless – yet slightly slimmer – creation which is fastened using a screwdriver. The closure is designed with a functional screw on one side of the bracelet and a hinge on the other. To determine the size of your #LOVE# bracelet, measure your wrist, adding one centimetre to your size for a tighter fit, or two centimetres for a looser fit.",
"image":["https://www.cartier.com/variants/images/25372685655708131/img1/w960.jpg"],
"offers": 
[{"@type":"Offer","availability":"http://schema.org/InStock","priceCurrency":"GBP","price":"4100","sku":"0400574782829","url":"https://www.cartier.com/en-gb/love-bracelet-small-model_cod25372685655708131.html"}]}]
    </script>

I have tried the following steps:

import json
from bs4 import BeautifulSoup
import requests
from multiprocessing import Pool
import pandas as pd

data = {'url':[],'offers_price':[]}

def get_price(url):
    soup = BeautifulSoup(requests.get(url, headers={'User-Agent': 'Mozilla/5.0'}).content, "html.parser")
    data = json.loads(soup.find_all('script', {'type': 'application/ld+json'})[-1].get_text())
    return url, int(data['offers']['price'])

if __name__ == '__main__':

    urls = ['https://www.cartier.com/en-gb/love-bracelet-small-model_cod25372685655708131.html#dept=EU_Love']

    with Pool(processes=4) as pool:
            for url, price in pool.imap_unordered(get_price, urls):
                    data['offers_price'].append(price)
                    data['url'].append(url)
    print(data)

But not successful. How would you approach in this case?

>Solution :

I was able to get the price, but I got it from the product-price tag:

import json
from bs4 import BeautifulSoup
import requests
from multiprocessing import Pool
import pandas as pd

data = {'url':[],'offers_price':[]}

def get_price(url):
    soup = BeautifulSoup(requests.get(url, headers={'User-Agent': 'Mozilla/5.0'}).content, "html.parser")
    data = json.loads(soup.find_all('product-price')[-1]['data-model'])
    return url, int(data['fullPrice'])

if __name__ == '__main__':

    urls = ['https://www.cartier.com/en-gb/love-bracelet-small-model_cod25372685655708131.html#dept=EU_Love']

    with Pool(processes=4) as pool:
            for url, price in pool.imap_unordered(get_price, urls):
                    data['offers_price'].append(price)
                    data['url'].append(url)
    print(data)

Output:

{'url': ['https://www.cartier.com/en-gb/love-bracelet-small-model_cod25372685655708131.html#dept=EU_Love'], 'offers_price': [4100]}

By the way, are you sure you want to append the url and the price? I think you should do this instead:

data['offers_price'] = price
data['url'] = url
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading