Home zipfile.BadZipFile: File is not a zip file when using "openpyxl" engine

Questions

zipfile.BadZipFile: File is not a zip file when using "openpyxl" engine

August 19, 2022

I have created a script which dumps the excel sheets stored in S3 into my local postgres database. I’ve used pandas read_excel and ExcelFile method to read the excel sheets.
Code for the same can be found here.

import boto3
import pandas as pd
import io
import os
from sqlalchemy import create_engine
import xlrd

os.environ["AWS_ACCESS_KEY_ID"] = "xxxxxxxxxxxx"
os.environ["AWS_SECRET_ACCESS_KEY"] = "xxxxxxxxxxxxxxxxxx"
s3 = boto3.client('s3')

obj = s3.get_object(Bucket='bucket-name', Key='file.xlsx')
data = pd.ExcelFile(io.BytesIO(obj['Body'].read()))
print(data.sheet_names)
a = len(data.sheet_names)

engine1 = create_engine('postgresql://postgres:postgres@localhost:5432/postgres')
for i in range(a):
    df = pd.read_excel(io.BytesIO(obj['Body'].read()),sheet_name=data.sheet_names[i], engine='openpyxl')
    df.to_sql("test"+str(i), engine1, index=False)

Basically, code parses the S3 bucket and runs in a loop. For each sheet, it creates a table
and dumps the data from sheet in that table.

Where I’m having trouble is, when I run this code, I get this error.

df = pd.read_excel(io.BytesIO(obj['Body'].read()),sheet_name=data.sheet_names[i-1], engine='openpyxl')
zipfile.BadZipFile: File is not a zip file

This is coming after I added ‘openpyxl’ engine in read_excel method. When I remove the engine, I get this error.

raise ValueError(
ValueError: Excel file format cannot be determined, you must specify an engine manually.

Please note that I can print the connection to database, so there is no problem in connectivity, and I’m using latest version of python and pandas. Also, I can get all the sheet_names in the excel file so I’m able to reach to that file as well.

Many Thanks!

>Solution :

You are reading the obj twice, fully:

data = pd.ExcelFile(io.BytesIO(obj['Body'].read()))
pd.read_excel(io.BytesIO(obj['Body'].read()), ...)

Your object can only be .read() once, second read produce nothing, an empty b"".

In order to avoid re-reading the S3 stream many times, you could store it once in a BytesIO, and rewind that BytesIO with seek.

buf = io.BytesIO(obj["Body"].read())

pd.ExcelFile(buf)

buf.seek(0)

pd.read_excel(buf, ...)

# repeat

boto3

byMR

Published August 19, 2022

Add a comment

Selecting dates where the difference is less than x in sql

byMR

August 19, 2022

Questions

Set Text Widget style with method rather than constructor

byMR

August 19, 2022

Questions

Read contents from zipfile, apply transformation and write to new zip file in Python

byMR

August 19, 2022

Questions

Azure ML ExecutableNotFound: failed to execute PosixPath('dot'), make sure the Graphviz executables are on your systems' PATH

byMR

August 19, 2022

Questions

Way to remove elements in json file using R

byMR

August 19, 2022

Questions

Add key and list value to a dictionary

byMR

August 19, 2022

zipfile.BadZipFile: File is not a zip file when using "openpyxl" engine

MEDevel.com: Open-source for Healthcare and Education

>Solution :

Like this:

Leave a ReplyCancel reply

Read more

Selecting dates where the difference is less than x in sql

Set Text Widget style with method rather than constructor

Read contents from zipfile, apply transformation and write to new zip file in Python

Azure ML ExecutableNotFound: failed to execute PosixPath('dot'), make sure the Graphviz executables are on your systems' PATH

Way to remove elements in json file using R

Add key and list value to a dictionary

Keep Up to Date with the Most Important News

zipfile.BadZipFile: File is not a zip file when using "openpyxl" engine

MEDevel.com: Open-source for Healthcare and Education

>Solution :

Share this:

Like this:

Leave a ReplyCancel reply

Keep Up to Date with the Most Important News

Read more

Selecting dates where the difference is less than x in sql

Set Text Widget style with method rather than constructor

Read contents from zipfile, apply transformation and write to new zip file in Python

Azure ML ExecutableNotFound: failed to execute PosixPath('dot'), make sure the Graphviz executables are on your systems' PATH

Way to remove elements in json file using R

Add key and list value to a dictionary

Discover more from Dev solutions