Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

What is the fastest way to retrieve header names from excel files using pandas

I have a big size excel files that I’m organizing the column names into a unique list.
The code below works, but it takes ~9 minutes!
Does anyone have suggestions for speeding it up?

import pandas as pd
import os
get_col = list(pd.read_excel("E:\DATA\dbo.xlsx",nrows=1, engine='openpyxl').columns)
print(get_col)

>Solution :

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

Using pandas to extract just the column names of a large excel file is very inefficient.
You can use openpyxl for this:

from openpyxl import load_workbook

wb = load_workbook("E:\DATA\dbo.xlsx", read_only=True)

columns = {}

for sheet in worksheets:
    for value in sheet.iter_rows(min_row=1, max_row=1, values_only=True):
        columns = value

Assuming you only have one sheet, you will get a tuple of column names here.

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading