Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Question About Loop Optimization and Speed

This is the gist of my code:

while (int(price) > targetPrice):

    try:
        details = requests.get(url, headers=headers).text
        var1 = (int)(re.search('desired-string(\d+)', details).group(1))
        var2 = (int)(re.search('desired-string(\d+)', details).group(1))
        var3 = (int)(re.search('desired-string(\d+)', details).group(1))    
    except (AttributeError, ValueError):
        print('Error')

Essentially, I have a loop that is constantly fetching a webpage and scraping desired pieces of data. The issue I have is that I need this loop to run as fast as possible. It takes an average of .33 seconds for the loop to iterate one time and I want to get this number as low as I can. The information I’m fetching changes every so often, and I need to fetch it as soon as that change occurs.

I found the reason it takes this long is due to the request I make. There is a lot of HTML present when I only require about 5 lines that are in the same spot within the HTML. Is there a way to have the request fetch specific lines of the HTML and ignore everything I don’t need?

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

The HTML being extracted is from this page: https://www.roblox.com/catalog/6803405665/Gucci-Dionysus-Bag

Multi-threading isn’t really what I’m after because the goal is to try and get the loop to iterate as fast as possible. Multi-threading, to my knowledge and testing, just allows the loop to run asynchronously but will still run at .33 seconds per iteration.

I believe this to be an optimization question if anything. Any assistance would be appreciated. If any further information is required, please let me know and I will provide it.

>Solution :

First thing I would try would be to use requests.Session
according to the doc https://2.python-requests.org/en/master/user/advanced/#session-objects:

The Session object allows you to persist certain parameters across requests. It >also persists cookies across all requests made from the Session instance, and >will use urllib3’s connection pooling. So if you’re making several requests to >the same host, the underlying TCP connection will be reused, which can result >in a significant performance increase (see HTTP persistent connection).

Instanciate the Session outside your while loop:

s = requests.Session()
while (int(price) > targetPrice):

    try:
        details = s.get(url, headers=headers).text
        var1 = (int)(re.search('desired-string(\d+)', details).group(1))
        var2 = (int)(re.search('desired-string(\d+)', details).group(1))
        var3 = (int)(re.search('desired-string(\d+)', details).group(1))    
    except (AttributeError, ValueError):
        print('Error')

If this is still not enough, maybe move to asynchronous requests https://pypi.org/project/aiohttp/

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading