This is the gist of my code:
while (int(price) > targetPrice):
try:
details = requests.get(url, headers=headers).text
var1 = (int)(re.search('desired-string(\d+)', details).group(1))
var2 = (int)(re.search('desired-string(\d+)', details).group(1))
var3 = (int)(re.search('desired-string(\d+)', details).group(1))
except (AttributeError, ValueError):
print('Error')
Essentially, I have a loop that is constantly fetching a webpage and scraping desired pieces of data. The issue I have is that I need this loop to run as fast as possible. It takes an average of .33 seconds for the loop to iterate one time and I want to get this number as low as I can. The information I’m fetching changes every so often, and I need to fetch it as soon as that change occurs.
I found the reason it takes this long is due to the request I make. There is a lot of HTML present when I only require about 5 lines that are in the same spot within the HTML. Is there a way to have the request fetch specific lines of the HTML and ignore everything I don’t need?
The HTML being extracted is from this page: https://www.roblox.com/catalog/6803405665/Gucci-Dionysus-Bag
Multi-threading isn’t really what I’m after because the goal is to try and get the loop to iterate as fast as possible. Multi-threading, to my knowledge and testing, just allows the loop to run asynchronously but will still run at .33 seconds per iteration.
I believe this to be an optimization question if anything. Any assistance would be appreciated. If any further information is required, please let me know and I will provide it.
>Solution :
First thing I would try would be to use requests.Session
according to the doc https://2.python-requests.org/en/master/user/advanced/#session-objects:
The Session object allows you to persist certain parameters across requests. It >also persists cookies across all requests made from the Session instance, and >will use urllib3’s connection pooling. So if you’re making several requests to >the same host, the underlying TCP connection will be reused, which can result >in a significant performance increase (see HTTP persistent connection).
Instanciate the Session outside your while loop:
s = requests.Session()
while (int(price) > targetPrice):
try:
details = s.get(url, headers=headers).text
var1 = (int)(re.search('desired-string(\d+)', details).group(1))
var2 = (int)(re.search('desired-string(\d+)', details).group(1))
var3 = (int)(re.search('desired-string(\d+)', details).group(1))
except (AttributeError, ValueError):
print('Error')
If this is still not enough, maybe move to asynchronous requests https://pypi.org/project/aiohttp/