Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

How to read a very large csv file into a pandas dataframe as quickly as possible?

I am reading a very large csv file (~1 million rows) into a pandas dataframe using pd.read_csv() function according to the following options: (note that the seconds are also inside timestamp column but not shown in here due to exact copy and paste from csv file)

enter image description here

pd.read_csv(file,
            index_col='Timestamp',
            engine='c',
            na_filter=False,
            parse_dates=['Timestamp'],
            infer_datetime_format=True,
            low_memory=True)

My question is how to speed up the reading as it is taking forever to read the file?

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

>Solution :

dask appears quicker at reading .csv files then the typical pandas.dataframe although the syntax remains similar.

The answer to this question appears to help using dask:

How to speed up loading data using pandas?

I use this method when working with .csv’s also when performance is an issue.

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading