Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

TimeoutError: The read operation timed out

The Script

Here’s a simple script that uses requests to download a web page:

import requests
import pandas as pd

from bs4 import BeautifulSoup

url = "https://www.cmegroup.com/markets/energy/crude-oil/light-sweet-crude.quotes.html"

data = requests.get(url).text

However, it seems to hang at the call to requests.get.

If I use the following instead:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

data = requests.get(url, allow_redirects=False, timeout=5).text

it outputs the following:

Traceback (most recent call last):
  File "C:\Users\dharm\anaconda3\envs\html-table-parse\lib\site-packages\urllib3\connectionpool.py", line 449, in _make_request
    six.raise_from(e, None)
  File "<string>", line 3, in raise_from
  File "C:\Users\dharm\anaconda3\envs\html-table-parse\lib\site-packages\urllib3\connectionpool.py", line 444, in _make_request
    httplib_response = conn.getresponse()
  File "C:\Users\dharm\anaconda3\envs\html-table-parse\lib\http\client.py", line 1374, in getresponse
    response.begin()
  File "C:\Users\dharm\anaconda3\envs\html-table-parse\lib\http\client.py", line 318, in begin
    version, status, reason = self._read_status()
  File "C:\Users\dharm\anaconda3\envs\html-table-parse\lib\http\client.py", line 279, in _read_status
    line = str(self.fp.readline(_MAXLINE + 1), "iso-8859-1")
  File "C:\Users\dharm\anaconda3\envs\html-table-parse\lib\socket.py", line 705, in readinto
    return self._sock.recv_into(b)
  File "C:\Users\dharm\anaconda3\envs\html-table-parse\lib\ssl.py", line 1274, in recv_into
    return self.read(nbytes, buffer)
  File "C:\Users\dharm\anaconda3\envs\html-table-parse\lib\ssl.py", line 1130, in read
    return self._sslobj.read(len, buffer)
TimeoutError: The read operation timed out
...

Conda environment

I’m running this in a conda environment with the following packages:

(html-table-parse) PS C:\Users\dharm\Dropbox\Documents> conda list
# packages in environment at C:\Users\dharm\anaconda3\envs\html-table-parse:
#
# Name                    Version                   Build  Channel
beautifulsoup4            4.11.1             pyha770c72_0    conda-forge
brotlipy                  0.7.0           py310he2412df_1004    conda-forge
bzip2                     1.0.8                h8ffe710_4    conda-forge
ca-certificates           2022.6.15            h5b45459_0    conda-forge
certifi                   2022.6.15       py310h5588dad_0    conda-forge
cffi                      1.15.1          py310hcbf9ad4_0    conda-forge
charset-normalizer        2.1.0              pyhd8ed1ab_0    conda-forge
cryptography              37.0.1          py310h21b164f_0
idna                      3.3                pyhd8ed1ab_0    conda-forge
intel-openmp              2022.1.0          h57928b3_3787    conda-forge
libblas                   3.9.0              15_win64_mkl    conda-forge
libcblas                  3.9.0              15_win64_mkl    conda-forge
libffi                    3.4.2                h8ffe710_5    conda-forge
liblapack                 3.9.0              15_win64_mkl    conda-forge
libzlib                   1.2.12               h8ffe710_1    conda-forge
mkl                       2022.1.0           h6a75c08_874    conda-forge
numpy                     1.23.0          py310h8a5b91a_0    conda-forge
openssl                   3.0.5                h8ffe710_0    conda-forge
pandas                    1.4.3           py310hf5e1058_0    conda-forge
pip                       22.1.2             pyhd8ed1ab_0    conda-forge
pycparser                 2.21               pyhd8ed1ab_0    conda-forge
pyopenssl                 22.0.0             pyhd8ed1ab_0    conda-forge
pysocks                   1.7.1           py310h5588dad_5    conda-forge
python                    3.10.5          hcf16a7b_0_cpython    conda-forge
python-dateutil           2.8.2              pyhd8ed1ab_0    conda-forge
python_abi                3.10                    2_cp310    conda-forge
pytz                      2022.1             pyhd8ed1ab_0    conda-forge
requests                  2.28.1             pyhd8ed1ab_0    conda-forge
setuptools                63.1.0          py310h5588dad_0    conda-forge
six                       1.16.0             pyh6c4a22f_0    conda-forge
soupsieve                 2.3.1              pyhd8ed1ab_0    conda-forge
sqlite                    3.39.0               h8ffe710_0    conda-forge
tbb                       2021.5.0             h2d74725_1    conda-forge
tk                        8.6.12               h8ffe710_0    conda-forge
tzdata                    2022a                h191b570_0    conda-forge
ucrt                      10.0.20348.0         h57928b3_0    conda-forge
urllib3                   1.26.9             pyhd8ed1ab_0    conda-forge
vc                        14.2                 hb210afc_6    conda-forge
vs2015_runtime            14.29.30037          h902a5da_6    conda-forge
wheel                     0.37.1             pyhd8ed1ab_0    conda-forge
win_inet_pton             1.1.0           py310h5588dad_4    conda-forge
xz                        5.2.5                h62dcd97_1    conda-forge

Question

What’s a good way to get the script to download the page as intended?

>Solution :

Just add relevant headers, the website seems to be blocking requests without valid headers:

import requests

url = 'https://www.cmegroup.com/markets/energy/crude-oil/light-sweet-crude.quotes.html'
headers = {'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9',
           'Accept-Encoding': 'gzip, deflate',
           'Accept-Language': 'en-US,en;q=0.9',
           'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.0.0 Safari/537.36'}

print(requests.get(url, headers=headers).text)
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading