I am working to build a web crawler to scrape data from the website:-
https://www.zillow.com/daytona-beach-fl-32119/?searchQueryState=%7B%22usersSearchTerm%22%3A%2232119%22%2C%22mapBounds%22%3A%7B%22west%22%3A-81.10488184448243%2C%22east%22%3A-80.95416315551759%2C%22south%22%3A29.100888224306114%2C%22north%22%3A29.222759141549485%7D%2C%22regionSelection%22%3A%5B%7B%22regionId%22%3A71798%2C%22regionType%22%3A7%7D%5D%2C%22isMapVisible%22%3Atrue%2C%22filterState%22%3A%7B%22sort%22%3A%7B%22value%22%3A%22globalrelevanceex%22%7D%2C%22ah%22%3A%7B%22value%22%3Atrue%7D%7D%2C%22isListVisible%22%3Atrue%2C%22mapZoom%22%3A13%7D
I am using Beautifulsoup. Below is the code I am using.
from bs4 import BeautifulSoup
import requests
import json
head = {'user-agent':'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/112.0.0.0 Safari/537.36'}
r = requests.get('https://www.zillow.com/search/GetSearchPageState.htm?searchQueryState=%7B%22pagination%22%3A%7B%7D%2C%22usersSearchTerm%22%3A%2232119%22%2C%22mapBounds%22%3A%7B%22west%22%3A-81.10488184448243%2C%22east%22%3A-80.95416315551759%2C%22south%22%3A29.100888224306114%2C%22north%22%3A29.222759141549485%7D%2C%22regionSelection%22%3A%5B%7B%22regionId%22%3A71798%2C%22regionType%22%3A7%7D%5D%2C%22isMapVisible%22%3Atrue%2C%22filterState%22%3A%7B%22sortSelection%22%3A%7B%22value%22%3A%22globalrelevanceex%22%7D%2C%22isAllHomes%22%3A%7B%22value%22%3Atrue%7D%7D%2C%22isListVisible%22%3Atrue%2C%22mapZoom%22%3A13%7D&wants={%22cat1%22:[%22mapResults%22]}&requestId=2', headers=head)
soup = BeautifulSoup(r.content, 'lxml')
soup.find('p').text
The above code is giving me the below result but in the form of a string.
'{"user":{"isLoggedIn":false,"email":"","displayName":"","hasHousingConnectorPermission":false,"savedHomesCount":0,"personalizedSearchTraceID":"645a34a7a49fa1da2b295a1946eeefac","guid":"b2ce92bf-af30-49fc-833d-46b36e6fdc3a","zuid":"","isBot":false,"userSpecializedSEORegion":false},"mapState":{"customRegionPolygonWkt":null,"schoolPolygonWkt":null,"isCurrentLocationSearch":false,"userPosition":{"lat":null,"lon":null}},"regionState":{"regionInfo":[{"regionType":7,"regionId":71798,"regionName":"32119","displayName":"Daytona Beach FL 32119","isPointRegion":false}],"regionBounds":{"north":29.187729,"east":-80.983208,"south":29.135948,"west":-81.075837}},"searchPageSeoObject":{"baseUrl":"/daytona-beach-fl-32119/","windowTitle":"32119 Real Estate - 32119 Homes For Sale | Zillow","metaDescription":"Zillow has 98 homes for sale in 32119. View listing photos, review sales history, and use our detailed real estate filters to find the perfect place."},"requestId":2,"cat1":{"searchResults":{"mapResults":[{"zpid":"2096421567","price":"$59,900","priceLabel":"$60K","beds":2,"baths":2.0,"area":1104,"latLong":{"latitude":29.16304,"longitude":-81.03416},"statusType":"FOR_SALE","statusText":"Home for sale","isFavorite":false,"isUserClaimingOwner":false,"isUserConfirmedClaim":false,"imgSrc":"https://photos.zillowstatic.com/fp/29356d546adfa1c41da85fc05c555123-p_e.jpg","hasImage":true,"visited":false,"listingType":"","variableData":{"type":"3D_HOME","text":"3D Tour"},"hdpData":{"homeInfo":{"zpid":2096421567,"zipcode":"32119","city":"Daytona Beach","state":"FL","latitude":29.16304,"longitude":-81.03416,"price":59900.0,"bathrooms":2.0,"bedrooms":2.0,"livingArea":1104.0,"homeType":"MANUFACTURED","homeStatus":"FOR_SALE","daysOnZillow":-1,"isFeatured":false,"shouldHighlight":false,"rentZestimate":333,"listing_sub_type":{"is_FSBA":true},"isUnmappable":false,"isPreforeclosureAuction":false,"homeStatusForHDP":"FOR_SALE","priceForHDP":59900.0,"isNonOwnerOccupied":true,"isPremierBuilder":false,"isZillowOwned":false,"currency":"USD","country":"USA","isShowcaseListing":false}},"shouldShowZestimateAsPrice":false,"detailUrl":"/homedetails/1304-Bunker-Hill-Dr-Daytona-Beach-FL-32119/2096421567_zpid/","pgapt":"ForSale","sgapt":"For Sale (Broker)","has3DModel":true,"hasVideo":false,"isHomeRec":false,"address":"--","hasAdditionalAttributions":false,"isFeaturedListing":false,"isShowcaseListing":false,"availabilityDate":"2023-04-26 00:00:00","timeOnZillow":1006527769},{"zpid":"2057848202","price":"$28,900","priceLabel":"$29K","beds":2,"baths":1.0,"area":672,"latLong":{"latitude":29.158297,"longitude":-80.99745},"statusType":"FOR_SALE","statusText":"Home for sale","isFavorite":false,"isUserClaimingOwner":false,"isUserConfirmedClaim":false,"imgSrc":"https://photos.zillowstatic.com/fp/5f5af01b01d47ca866b74e1cc5580f5c-p_e.jpg","hasImage":true,"visited":false,"listingType":"","variableData":
But when I am looking for API endpoint using developer tools I can this URL is giving JSON data on the browser, but when I am trying to lead json it’s throwing an error:-
JSONDecodeError: Expecting value: line 1 column 1 (char 0)
which states that there is no JSON data.
I want all the displayed property which I finds after entering some details n search box like pin code, city name etc.
>Solution :
To get the JSON directly from the response call the json()
method:
url = 'https://www.zillow.com/search/GetSearchPageState.htm?searchQueryState=%7B%22pagination%22%3A%7B%7D%2C%22usersSearchTerm%22%3A%2232119%22%2C%22mapBounds%22%3A%7B%22west%22%3A-81.10488184448243%2C%22east%22%3A-80.95416315551759%2C%22south%22%3A29.100888224306114%2C%22north%22%3A29.222759141549485%7D%2C%22regionSelection%22%3A%5B%7B%22regionId%22%3A71798%2C%22regionType%22%3A7%7D%5D%2C%22isMapVisible%22%3Atrue%2C%22filterState%22%3A%7B%22sortSelection%22%3A%7B%22value%22%3A%22globalrelevanceex%22%7D%2C%22isAllHomes%22%3A%7B%22value%22%3Atrue%7D%7D%2C%22isListVisible%22%3Atrue%2C%22mapZoom%22%3A13%7D&wants={%22cat1%22:[%22mapResults%22]}&requestId=2'
requests.get(url, headers=head).json()
Else as in your example, you have to convert the string into JSON via json.loads()
back while you convert it into BeautifulSoup
first and than into text:
json.loads(soup.find('p').text)
Now you could pick your information from the JSON or convert it into a dataframe
:
pd.DataFrame(json.loads(soup.find('p').text))