Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Scraping a webpage with Python but unsure how to deal with a static(?) URL

I am trying to learn how to pull data from this url:
https://denver.coloradotaxsale.com/index.cfm?folder=auctionResults&mode=preview

However, the problem is that the URL doesn’t change when I am trying to switch pages so I am not exactly sure how to enumerate or loop through it. Trying to find a better way since the webpage has 3 thousand datapoints of sales.

Here is my starting code it is very simple but I would appreciate any help that can be given or any hints. I think I might need to change to another package but I am not sure which one maybe beautifulsoup?

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

import requests 
url = "https://denver.coloradotaxsale.com/index.cfm?folder=auctionResults&mode=preview"

html = requests.get(url).content
df_list = pd.read_html(html,header = 1)[0]
df_list = df_list.drop([0,1,2]) #Drop unnecessary rows 

>Solution :

To get the data from more pages you can use this example:

import requests
import pandas as pd
from bs4 import BeautifulSoup


data = {
    "folder": "auctionResults",
    "loginID": "00",
    "pageNum": "1",
    "orderBy": "AdvNum",
    "orderDir": "asc",
    "justFirstCertOnGroups": "1",
    "doSearch": "true",
    "itemIDList": "",
    "itemSetIDList": "",
    "interest": "",
    "premium": "",
    "itemSetDID": "",
}

url = "https://denver.coloradotaxsale.com/index.cfm?folder=auctionResults&mode=preview"


all_data = []

for data["pageNum"] in range(1, 3):  # <-- increase number of pages here.
    soup = BeautifulSoup(requests.post(url, data=data).content, "html.parser")
    for row in soup.select("#searchResults tr")[2:]:
        tds = [td.text.strip() for td in row.select("td")]
        all_data.append(tds)

columns = [
    "SEQ NUM",
    "Tax Year",
    "Notices",
    "Parcel ID",
    "Face Amount",
    "Winning Bid",
    "Sold To",
]

df = pd.DataFrame(all_data, columns=columns)

# print last 10 items from dataframe:
print(df.tail(10).to_markdown())

Prints:

SEQ NUM Tax Year Notices Parcel ID Face Amount Winning Bid Sold To
96 000094 2020 00031-18-001-000 $905.98 $81.00 00005517
97 000095 2020 00031-18-002-000 $750.13 $75.00 00005517
98 000096 2020 00031-18-003-000 $750.13 $75.00 00005517
99 000097 2020 00031-18-004-000 $750.13 $75.00 00005517
100 000098 2020 00031-18-007-000 $750.13 $76.00 00005517
101 000099 2020 00031-18-008-000 $905.98 $84.00 00005517
102 000100 2020 00031-19-001-000 $1,999.83 $171.00 00005517
103 000101 2020 00031-19-004-000 $1,486.49 $131.00 00005517
104 000102 2020 00031-19-006-000 $1,063.44 $96.00 00005517
105 000103 2020 00031-20-001-000 $1,468.47 $126.00 00005517
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading