Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

request.get changes the content of the website? (Webscraping)

I am facing an issue while trying to scrape information from a website using the requests.get method. The information I receive from the website is inconsistent and doesn’t match the actual data displayed on the website.

As an example, I have tried to scrape the size of an apartment located at the following link:
https://www.sreality.cz/en/detail/sale/flat/2+kt/havlickuv-brod-havlickuv-brod-stromovka/3574729052. The size of the apartment is displayed as 54 square meters on the website, but when I use the requests.get method, the result shows 43 square meters instead of 54.
Apartment size on the webpage
Apartment size from the inspect code
Result in vscode

I have attached screenshots of the apartment size displayed on the website and the result in my Visual Studio Code for reference. The code I used for this is given below:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

import requests

test = requests.get("https://www.sreality.cz/api/cs/v2/estates/3574729052?tms=1676140494143").json()

test["items"][8]

I am unable to find a solution to this issue and would greatly appreciate any help or guidance. If there is anything wrong with the format of my post, please let me know and I will make the necessary changes. Thank you in advance.

>Solution :

Here is one way to get the information you’re after:

import requests
import pandas as pd

pd.set_option('display.max_columns', None, 'display.max_colwidth', None)
headers = {
    'accept': 'application/json, text/plain, */*',
    'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/109.0.0.0 Safari/537.36'
}
s = requests.Session()
s.headers.update(headers)
url = 'https://www.sreality.cz/en/detail/sale/flat/2+kt/havlickuv-brod-havlickuv-brod-stromovka/3574729052'
property_id = url.split('/')[-1]
api_url = f'https://www.sreality.cz/api/en/v2/estates/{property_id}'
s.get(url)
df = pd.json_normalize(s.get(api_url).json()['items'])
df = df[['name', 'value']]
print(df)

Result in terminal:

name    value
0   Total price 3 905 742
1   Update  Yesterday
2   ID  3574729052
3   Building    Brick
4   Property status Under construction
5   Ownership   Personal
6   Property location   Quiet part of municipality
7   Floor   3. floor of total 5 including 1 underground
8   Usable area 54
9   Balcony 4
10  Cellar  2
11  Sales commencement date 25.04.2022
12  Water   [{'name': 'Water', 'value': 'District water supply'}]
13  Electricity [{'name': 'Electricity', 'value': '230 V'}]
14  Transportation  [{'name': 'Transportation', 'value': 'Train'}, {'name': 'Transportation', 'value': 'Road'}, {'name': 'Transportation', 'value': 'Urban public transportation'}, {'name': 'Transportation', 'value': 'Bus'}]
15  Road    [{'name': 'Road', 'value': 'Asphalt'}]
16  Barrier-free access True
17  Furnished   False
18  Elevator    True
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading