Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Pandas Webscraping Errors

I’m currently trying to webscrape websites for tables using pandas and I get this error for one of the links.

Here’s a snippet of what causes the crash:

import pandas as pd
website_df = pd.read_html("https://ballotpedia.org/Roger_Wicker")
print(website_df)

Below is the error I get, does anyone know how to fix this?

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

Traceback (most recent call last):
  File "C:\Users\miniconda3\lib\site-packages\pandas\io\parsers\python_parser.py", line 700, in _next_line
    line = self._check_comments([self.data[self.pos]])[0]
IndexError: list index out of range

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "C:\Users\miniconda3\lib\site-packages\pandas\io\parsers\python_parser.py", line 385, in _infer_columns
    line = self._next_line()
  File "C:\Users\miniconda3\lib\site-packages\pandas\io\parsers\python_parser.py", line 713, in _next_line
    raise StopIteration
StopIteration

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "c:\Users\legislators-current.py", line 15, in <module>
    website_df = pd.read_html("https://ballotpedia.org/Roger_Wicker")
  File "C:\Users\miniconda3\lib\site-packages\pandas\util\_decorators.py", line 331, in wrapper
    return func(*args, **kwargs)
  File "C:\Users\miniconda3\lib\site-packages\pandas\io\html.py", line 1205, in read_html
    return _parse(
  File "C:\Users\miniconda3\lib\site-packages\pandas\io\html.py", line 1011, in _parse
    df = _data_to_frame(data=table, **kwargs)
  File "C:\Users\miniconda3\lib\site-packages\pandas\io\html.py", line 890, in _data_to_frame
    with TextParser(body, header=header, **kwargs) as tp:
  File "C:\Users\miniconda3\lib\site-packages\pandas\io\parsers\readers.py", line 1876, in TextParser
    return TextFileReader(*args, **kwds)
  File "C:\Users\miniconda3\lib\site-packages\pandas\io\parsers\readers.py", line 1442, in __init__
    self._engine = self._make_engine(f, self.engine)
  File "C:\Users\miniconda3\lib\site-packages\pandas\io\parsers\readers.py", line 1753, in _make_engine
    return mapping[engine](f, **self.options)
  File "C:\Users\miniconda3\lib\site-packages\pandas\io\parsers\python_parser.py", line 122, in __init__
    ) = self._infer_columns()
  File "C:\Users\miniconda3\lib\site-packages\pandas\io\parsers\python_parser.py", line 395, in _infer_columns
    raise ValueError(
ValueError: Passed header=[1,2], len of 2, but only 2 lines in file

>Solution :

Set header=0. You’re going to get a lot of dataframes, but you can parse them to get what you need.

website_df = pd.read_html("https://ballotpedia.org/Roger_Wicker", header=0)
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading