Home What is the fastest way to retrieve header names from excel files using pandas

Questions

What is the fastest way to retrieve header names from excel files using pandas

August 9, 2022

I have a big size excel files that I’m organizing the column names into a unique list.
The code below works, but it takes ~9 minutes!
Does anyone have suggestions for speeding it up?

import pandas as pd
import os
get_col = list(pd.read_excel("E:\DATA\dbo.xlsx",nrows=1, engine='openpyxl').columns)
print(get_col)

>Solution :

Using pandas to extract just the column names of a large excel file is very inefficient.
You can use openpyxl for this:

from openpyxl import load_workbook

wb = load_workbook("E:\DATA\dbo.xlsx", read_only=True)

columns = {}

for sheet in worksheets:
    for value in sheet.iter_rows(min_row=1, max_row=1, values_only=True):
        columns = value

Assuming you only have one sheet, you will get a tuple of column names here.