Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

How to read .dta into Python

I want to read data from http://fmwww.bc.edu/ec-p/data/wooldridge/401k.dta. I tried below,

import pandas as pd
import pyreadstat as pyreadstat

dataframe, meta = pyreadstat.read_dta("http://fmwww.bc.edu/ec-p/data/wooldridge/401k.dta")

With this I am getting below error

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "pyreadstat/pyreadstat.pyx", line 260, in pyreadstat.pyreadstat.read_dta
  File "pyreadstat/_readstat_parser.pyx", line 1012, in pyreadstat._readstat_parser.run_conversion
pyreadstat._readstat_parser.PyreadstatError: File http://fmwww.bc.edu/ec-p/data/wooldridge/401k.dta does not exist!

I also tried using pandas, but failed

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

>>> Data = pd.read_stata("http://fmwww.bc.edu/ec-p/data/wooldridge/401k.dta")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python3.9/site-packages/pandas/io/stata.py", line 1898, in read_stata
    reader = StataReader(
  File "/usr/local/lib/python3.9/site-packages/pandas/io/stata.py", line 1066, in __init__
    self._read_header()
  File "/usr/local/lib/python3.9/site-packages/pandas/io/stata.py", line 1095, in _read_header
    self._read_old_header(first_char)
  File "/usr/local/lib/python3.9/site-packages/pandas/io/stata.py", line 1299, in _read_old_header
    raise ValueError(_version_error.format(version=self.format_version))
ValueError: Version of given Stata file is 110. pandas supports importing versions 105, 108, 111 (Stata 7SE), 113 (Stata 8/9), 114 (Stata 10/11), 115 (Stata 12), 117 (Stata 13), 118 (Stata 14/15/16),and 119 (Stata 15/16, over 32,767 variables).

However with R I could download this using data without any problem,

> head(read.dta("http://fmwww.bc.edu/ec-p/data/wooldridge/401k.dta"))
  prate mrate totpart totelg age totemp sole  ltotemp
1  26.1  0.21    1653   6322   8   8709    0 9.072112
2 100.0  1.42     262    262   6    315    1 5.752573
3  97.6  0.91     166    170  10    275    1 5.616771
4 100.0  0.42     257    257   7    500    0 6.214608
5  82.5  0.53     591    716  28    933    1 6.838405
6 100.0  1.82      92     92   7    143    1 4.962845

Could you please help me to download this data with Python?

>Solution :

import requests
import pyreadstat

url = 'http://fmwww.bc.edu/ec-p/data/wooldridge/401k.dta'

def download_file(url):
    local_filename = url.split('/')[-1]
    with requests.get(url, stream=True) as r:
        r.raise_for_status()
        with open(local_filename, 'wb') as f:
            for chunk in r.iter_content(chunk_size=8192): 
                f.write(chunk)
    return local_filename

# download_file(url)

df, meta = pyreadstat.read_dta(download_file(url))
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading