Advertisements
I have an excel workbook with many sheets which I would like to add into one big csv file using Python. I have written the below code which currently only adds the final sheet’s data to the csv. How do I refactor this so that ALL the sheets in the workbook are being added/appended to the file, rather than overwriting it each time?
# save xlsx to variable
xls = pd.ExcelFile("Test1.xlsx")
# create a new csv to write everything to
with open('emails.csv', 'w') as creating_new_csv_file:
pass
# for each sheet in the file, get the 3 necessary columns saved to variable
for sheet_name in xls.sheet_names:
read_file = pd.read_excel('Test1.xlsx', sheet_name=[f'{sheet_name}'], header=0, usecols=['Page', 'URL', 'Person'])
# now loop through that dictionary object to exclude the actual sheet name and be able to output to csv
for sheet in read_file:
print(read_file[sheet])
# adds the data to the csv created earlier. However, only adds the final sheet, whereas
# I want to be appending each one so that they are all in the csv together.
read_file[sheet].to_csv("emails.csv",
index=False,
header=True,
columns=['Page', 'URL', 'Person']
)
I’ve tried a few different things but to be honest I’m very new at Python and especially Pandas library, so appreciate any advice I can get on this. I’d also love feedback on if there is a better way to do this.
Please let me know if my request makes no sense!
>Solution :
import pandas as pd
# Open the Excel file
xls = pd.ExcelFile("Test1.xlsx")
# Create a new CSV file in append mode
with open('emails.csv', 'a') as csv_file:
# Iterate through all sheets in the Excel file
for sheet_name in xls.sheet_names:
# Read the specific sheet with the desired columns
read_file = pd.read_excel('Test1.xlsx', sheet_name=sheet_name, header=0, usecols=['Page', 'URL', 'Person'])
# Append the data to the CSV file
read_file.to_csv(csv_file, index=False, header=not csv_file.tell(), columns=['Page', 'URL', 'Person'])