Question About Loop Optimization and Speed

December 27, 2021

This is the gist of my code:

while (int(price) > targetPrice):

    try:
        details = requests.get(url, headers=headers).text
        var1 = (int)(re.search('desired-string(\d+)', details).group(1))
        var2 = (int)(re.search('desired-string(\d+)', details).group(1))
        var3 = (int)(re.search('desired-string(\d+)', details).group(1))    
    except (AttributeError, ValueError):
        print('Error')

Essentially, I have a loop that is constantly fetching a webpage and scraping desired pieces of data. The issue I have is that I need this loop to run as fast as possible. It takes an average of .33 seconds for the loop to iterate one time and I want to get this number as low as I can. The information I’m fetching changes every so often, and I need to fetch it as soon as that change occurs.

I found the reason it takes this long is due to the request I make. There is a lot of HTML present when I only require about 5 lines that are in the same spot within the HTML. Is there a way to have the request fetch specific lines of the HTML and ignore everything I don’t need?

The HTML being extracted is from this page: https://www.roblox.com/catalog/6803405665/Gucci-Dionysus-Bag

Multi-threading isn’t really what I’m after because the goal is to try and get the loop to iterate as fast as possible. Multi-threading, to my knowledge and testing, just allows the loop to run asynchronously but will still run at .33 seconds per iteration.

I believe this to be an optimization question if anything. Any assistance would be appreciated. If any further information is required, please let me know and I will provide it.

>Solution :

First thing I would try would be to use requests.Session
according to the doc https://2.python-requests.org/en/master/user/advanced/#session-objects:

The Session object allows you to persist certain parameters across requests. It >also persists cookies across all requests made from the Session instance, and >will use urllib3’s connection pooling. So if you’re making several requests to >the same host, the underlying TCP connection will be reused, which can result >in a significant performance increase (see HTTP persistent connection).

Instanciate the Session outside your while loop:

s = requests.Session()
while (int(price) > targetPrice):

    try:
        details = s.get(url, headers=headers).text
        var1 = (int)(re.search('desired-string(\d+)', details).group(1))
        var2 = (int)(re.search('desired-string(\d+)', details).group(1))
        var3 = (int)(re.search('desired-string(\d+)', details).group(1))    
    except (AttributeError, ValueError):
        print('Error')

If this is still not enough, maybe move to asynchronous requests https://pypi.org/project/aiohttp/