I’m scraping something in PyCharm and am looking to just make sure that it is working first before proceeding. The code will not print its outputs, though, to the console after I run it. Here is the code:
Thank you!!!
from bs4 import BeautifulSoup
import requests
url = "https://www.zillow.com/philadelphia-pa/rentals/?searchQueryState=%7B%22pagination%22%3A%7B%7D%2C%22usersSearchTerm%22%3A%22Philadelphia%2C%20PA%22%2C%22mapBounds%22%3A%7B%22west%22%3A-75.2058476090698%2C%22east%22%3A-75.17623602154539%2C%22south%22%3A39.9520661821946%2C%22north%22%3A39.97380838759173%7D%2C%22regionSelection%22%3A%5B%7B%22regionId%22%3A13271%2C%22regionType%22%3A6%7D%5D%2C%22isMapVisible%22%3Afalse%2C%22filterState%22%3A%7B%22fsba%22%3A%7B%22value%22%3Afalse%7D%2C%22fsbo%22%3A%7B%22value%22%3Afalse%7D%2C%22nc%22%3A%7B%22value%22%3Afalse%7D%2C%22fore%22%3A%7B%22value%22%3Afalse%7D%2C%22cmsn%22%3A%7B%22value%22%3Afalse%7D%2C%22auc%22%3A%7B%22value%22%3Afalse%7D%2C%22fr%22%3A%7B%22value%22%3Atrue%7D%2C%22ah%22%3A%7B%22value%22%3Atrue%7D%7D%2C%22isListVisible%22%3Atrue%2C%22mapZoom%22%3A15%7D"
page = requests.get(url)
soup = BeautifulSoup(page.content, 'html.parser')
lists = soup.find_all('section', class_="list-card-info")
for list in lists:
title = list.find('a', class_="list-card-addr")
price = list.find('div', class_="list-card-price")
info = [title, price]
print(info)
>Solution :
I think you can bypass the captcha by adding a header to the request:
header = {
"user-agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.169 Safari/537.36" ,
'referer':'https://www.google.com/'
}
page = requests.get(url, headers=header)
soup = BeautifulSoup(page.content, 'html')
lists = soup.find_all(class_="list-card-info")
for list in lists:
title = list.find('address', class_="list-card-addr")
price = list.find('div', class_="list-card-price")
info = [title, price]
print(info)
It should return:
[<address class="list-card-addr">Good Food Flats - Student Housing | 4030 Baring St, Philadelphia, PA</address>, <div class="list-card-price">$825+<abbr class="list-card-label"> <!-- -->1 bd</abbr></div>]
[<address class="list-card-addr">Vue32 | 3201 Race St, Philadelphia, PA</address>, <div class="list-card-price">$2,037+<abbr class="list-card-label"> <!-- -->1 bd</abbr></div>]
[<address class="list-card-addr">N40 Apartments, 44 N 40th St, Philadelphia, PA 19104</address>, <div class="list-card-price">$3,221+/mo</div>]
[<address class="list-card-addr">Fairmount North | 2601 Poplar St, Philadelphia, PA</address>, <div class="list-card-price">$1,560+<abbr class="list-card-label"> <!-- -->Studio</abbr></div>]
[<address class="list-card-addr">The HUB on Chestnut | 3945 Chestnut St, Philadelphia, PA</address>, <div class="list-card-price">$1,680+<abbr class="list-card-label"> <!-- -->1 bd</abbr></div>]
[<address class="list-card-addr">Korman Residential at 3737 Chestnut | 3737 Chestnut St, Philadelphia, PA</address>, <div class="list-card-price">$3,800+<abbr class="list-card-label"> <!-- -->Studio</abbr></div>]
[<address class="list-card-addr">2116 Chestnut | 2116 Chestnut St, Philadelphia, PA</address>, <div class="list-card-price">$2,115+<abbr class="list-card-label"> <!-- -->Studio</abbr></div>]
[<address class="list-card-addr">Arrive University City | 3601 Market St, Philadelphia, PA</address>, <div class="list-card-price">$2,200+<abbr class="list-card-label"> <!-- -->Studio</abbr></div>]
[<address class="list-card-addr">Chestnut Hall | 3900 Chestnut St, Philadelphia, PA</address>, <div class="list-card-price">$1,325+<abbr class="list-card-label"> <!-- -->Studio</abbr></div>]
[None, None]