Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Varying skiprows argument for individual worksheets when using pd.read_excel()

I have an excel-workbook with two worksheets which I want to import using pandas. I would like to use the skiprows argument, with 5 rows for the first sheet and 10 rows for the second sheet. pd.read_excel() returns a dictionary in case the excel file has several worksheets. Can I pass the skiprows argument as a list or similar to make it apply varying values for the individual sheets? I’ve tried this:

ret = pd.read_excel(file, skiprows=[5, 10], sheet_name=None)

It looks like the skiprows-argument is simply ignored in this case.
Can I make it work somehow?

If yes, this can hopefully be generalized to other arguments too.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

>Solution :

It’s complicated to generalize to all arguments. You have to iterate on each sheets.

A possible solution:

sheets = {
    'Sheet1': {'skiprows': 5},   # add other parameters
    'Sheet2': {'skiprows': 10},  # according the documentation
}

def read_excel_sheets(filename, sheets):
    dfs = {}
    with pd.ExcelFile(filename) as xlsx:
        for sheet_name, params in sheets.items():
            dfs[sheet_name] = xlsx.parse(sheet_name=sheet_name, **params)
    return dfs

# The output is the same as pd.read_excel(filename, sheets=None)
dfs = read_excel_sheets('data.xlsx', sheets)
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading