Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

how to read and compare two excel files with multiple worksheets?

I have two excel files and both of them have 10 worksheets. I wanted to read each worksheets, compare them and print data in 3rd excel file, even that would be written in multiple worksheets.

The below program works for single worksheet

import pandas as pd

df1 = pd.read_excel('zyx_5661.xlsx')
df2 = pd.read_excel('zyx_5662.xlsx')

df1.rename(columns= lambda x : x + '_file1', inplace=True)
df2.rename(columns= lambda x : x + '_file2', inplace=True)

df_join = df1.merge(right = df2, left_on = df1.columns.to_list(), right_on = df2.columns.to_list(), how = 'outer')

with pd.ExcelWriter('xl_join_diff.xlsx') as writer:
    df_join.to_excel(writer, sheet_name='testing', index=False)

How can I optimize it to work with multiple worksheets? Please advice.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

Thank you.

>Solution :

I think this should achieve what you need. Loop through each sheet name (assuming they’re named the same across both excel documents. If not, you can use numbers instead). Write the new output to a new sheet, and save the excel document.

import pandas as pd
    
writer = pd.ExcelWriter('xl_join_diff.xlsx')

for sheet in ['sheet1', 'sheet2', 'sheet3']: #list of sheet names
    #Pull in data for each sheet, and merge together.
    df1 = pd.read_excel('zyx_5661.xlsx', sheet_name=sheet)
    df2 = pd.read_excel('zyx_5662.xlsx', sheet_name=sheet)

    df1.rename(columns= lambda x : x + '_file1', inplace=True)
    df2.rename(columns= lambda x : x + '_file2', inplace=True)

    df_join = df1.merge(right=df2, left_on=df1.columns.to_list(),
                        right_on=df2.columns.to_list(), how='outer')
   
    df_join.to_excel(writer, sheet, index=False) #write to excel as new sheet
    
writer.save() #save excel document once all sheets have been done
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading