Append many xlsx sheets to a csv file

Advertisements

I have an excel workbook with many sheets which I would like to add into one big csv file using Python. I have written the below code which currently only adds the final sheet’s data to the csv. How do I refactor this so that ALL the sheets in the workbook are being added/appended to the file, rather than overwriting it each time?

# save xlsx to variable
xls = pd.ExcelFile("Test1.xlsx")

# create a new csv to write everything to
with open('emails.csv', 'w') as creating_new_csv_file:
   pass

# for each sheet in the file, get the 3 necessary columns saved to variable
for sheet_name in xls.sheet_names:
    read_file = pd.read_excel('Test1.xlsx', sheet_name=[f'{sheet_name}'], header=0, usecols=['Page', 'URL', 'Person'])
    # now loop through that dictionary object to exclude the actual sheet name and be able to output to csv
    for sheet in read_file:
        print(read_file[sheet])
        # adds the data to the csv created earlier. However, only adds the final sheet, whereas
        # I want to be appending each one so that they are all in the csv together.
        read_file[sheet].to_csv("emails.csv",
                         index=False,
                         header=True,
                         columns=['Page', 'URL', 'Person']
                         )

I’ve tried a few different things but to be honest I’m very new at Python and especially Pandas library, so appreciate any advice I can get on this. I’d also love feedback on if there is a better way to do this.

Please let me know if my request makes no sense!

>Solution :

import pandas as pd

# Open the Excel file
xls = pd.ExcelFile("Test1.xlsx")

# Create a new CSV file in append mode
with open('emails.csv', 'a') as csv_file:
    # Iterate through all sheets in the Excel file
    for sheet_name in xls.sheet_names:
        # Read the specific sheet with the desired columns
        read_file = pd.read_excel('Test1.xlsx', sheet_name=sheet_name, header=0, usecols=['Page', 'URL', 'Person'])
        # Append the data to the CSV file
        read_file.to_csv(csv_file, index=False, header=not csv_file.tell(), columns=['Page', 'URL', 'Person'])

Leave a ReplyCancel reply