How do web scrape more underlying data from a websites map location?

June 6, 2022

Currently, I have successfully used python to scrape data from a competitor’s website to find out store information. The website has a map where you can enter a zip code and it will tell you all the stores in the area of a my current location. The website sends a GET request to pull store data by using this link:

https://www.homedepot.com/StoreSearchServices/v2/storesearch?address=37028&radius=50&pagesize=30

My goal is to scrape all store information not just the imaginary zip code = 12345 & pagesize=30.
How should I go about getting all the store information? Would it be better to iterate through a dataset of zip codes to pull all the stores or is there a better way to do this? I’ve tried expanding past 30 page size but it looks like that is the limit on the request.

>Solution :

This url gives JSON with "currentPage":1 which can means it can use some kind of pagination.

I added &page=2 and it seems it works

Page 1:

https://www.homedepot.com/StoreSearchServices/v2/storesearch?address=37028&radius=250&pagesize=40&page=1

Page 2:

https://www.homedepot.com/StoreSearchServices/v2/storesearch?address=37028&radius=250&pagesize=40&page=2

Page 3:

https://www.homedepot.com/StoreSearchServices/v2/storesearch?address=37028&radius=250&pagesize=40&page=3

For test I use bigger range=250 to get JSON with "recordCount":123

I found that it works also with pagesize=40.
For bigger value it sends JSON with error message.

EDIT:

Minimal working code:

Page blocks request without User-Agent

import requests

headers = {
    'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64; rv:98.0) Gecko/20100101 Firefox/98.0',
}

url = 'https://www.homedepot.com/StoreSearchServices/v2/storesearch'

payload = {
    'address': 37028,
    'radius': 250,
    'pagesize': 40,
    'page': 1,
}

page = 0

while True:

    page += 1
    print('--- page:', page, '---')
    
    payload['page'] = page
    response = requests.get(url, params=payload, headers=headers)
    
    data = response.json()

    print(data['searchReport'])
                        
    if "stores" not in data:
        break
    
    for number, item in enumerate(data['stores'], 1):
        print(f'{number:2} | phone: {item["phone"]} | zip: {item["address"]["postalCode"]}')

Result:

--- page: 1 ---
{'recordCount': 123, 'currentPage': 1, 'storesPerPage': 40}
 1 | phone: (931)906-2655 | zip: 37040
 2 | phone: (270)442-0817 | zip: 42001
 3 | phone: (615)662-7600 | zip: 37221
 4 | phone: (615)865-9600 | zip: 37115
 5 | phone: (615)228-3317 | zip: 37216
 6 | phone: (615)269-7800 | zip: 37204
 7 | phone: (615)824-2391 | zip: 37075
 8 | phone: (615)370-0730 | zip: 37027
 9 | phone: (615)889-7211 | zip: 37076
10 | phone: (615)599-4578 | zip: 37064

etc. 

--- page: 2 ---
{'recordCount': 123, 'currentPage': 2, 'storesPerPage': 40}
 1 | phone: (662)890-9470 | zip: 38654
 2 | phone: (502)964-1845 | zip: 40219
 3 | phone: (812)941-9641 | zip: 47150
 4 | phone: (812)282-0470 | zip: 47129
 5 | phone: (662)349-6080 | zip: 38637
 6 | phone: (502)899-3706 | zip: 40207
 7 | phone: (662)840-8390 | zip: 38866
 8 | phone: (502)491-3682 | zip: 40220
 9 | phone: (870)268-0619 | zip: 72404
10 | phone: (256)575-2100 | zip: 35768

etc.