Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Append many xlsx sheets to a csv file

I have an excel workbook with many sheets which I would like to add into one big csv file using Python. I have written the below code which currently only adds the final sheet’s data to the csv. How do I refactor this so that ALL the sheets in the workbook are being added/appended to the file, rather than overwriting it each time?

# save xlsx to variable
xls = pd.ExcelFile("Test1.xlsx")

# create a new csv to write everything to
with open('emails.csv', 'w') as creating_new_csv_file:
   pass

# for each sheet in the file, get the 3 necessary columns saved to variable
for sheet_name in xls.sheet_names:
    read_file = pd.read_excel('Test1.xlsx', sheet_name=[f'{sheet_name}'], header=0, usecols=['Page', 'URL', 'Person'])
    # now loop through that dictionary object to exclude the actual sheet name and be able to output to csv
    for sheet in read_file:
        print(read_file[sheet])
        # adds the data to the csv created earlier. However, only adds the final sheet, whereas
        # I want to be appending each one so that they are all in the csv together.
        read_file[sheet].to_csv("emails.csv",
                         index=False,
                         header=True,
                         columns=['Page', 'URL', 'Person']
                         )

I’ve tried a few different things but to be honest I’m very new at Python and especially Pandas library, so appreciate any advice I can get on this. I’d also love feedback on if there is a better way to do this.

Please let me know if my request makes no sense!

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

>Solution :

import pandas as pd

# Open the Excel file
xls = pd.ExcelFile("Test1.xlsx")

# Create a new CSV file in append mode
with open('emails.csv', 'a') as csv_file:
    # Iterate through all sheets in the Excel file
    for sheet_name in xls.sheet_names:
        # Read the specific sheet with the desired columns
        read_file = pd.read_excel('Test1.xlsx', sheet_name=sheet_name, header=0, usecols=['Page', 'URL', 'Person'])
        # Append the data to the CSV file
        read_file.to_csv(csv_file, index=False, header=not csv_file.tell(), columns=['Page', 'URL', 'Person'])
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading