Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Python/Pandas – Remove the first row with Unnamed: 0,Unnamed: 1,Unnamed: 2,Unnamed: 3,Unnamed: 4,Unnamed: 5,Unnamed: 6,Unnamed: 7

I am splitting a xlsm file ( with multiple sheets) into a csv with each sheet as a separate csv file. In the process I am removing first 4 lines and keeping the header only which start from row 5.
Here is the input file

enter image description here

This is the output I expect –

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

enter image description here

But I am getting the below output –

enter image description here

Here is the code I am using –

import pandas as pd
xl = pd.ExcelFile('Sample_File.xlsm')
for sheet in xl.sheet_names:
    df = pd.read_excel(xl,sheet_name=sheet)
    df1 = df.iloc[3:]
    df1.to_csv(f"{sheet}.csv",index=False)

How can I remove the first row which is having values from unnamed: 0 to unnamed: 9?

Also how can I have comment1 show up in both (First and second row) and comment2 in (Third and Fourth row) in last column?

>Solution :

You need to use the skiprows argument inside the pd.read_excel function to correctly get the column names in the 5th row.

import pandas as pd

xl = pd.ExcelFile('Sample_File.xlsm')

for sheet in xl.sheet_names:
    df = pd.read_excel(xl, sheet_name=sheet, skiprows=4)
    # no more iloc here
    df.to_csv(f'{sheet}.csv', index=False)
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading