I have a binary value beginning with ‘504B030414…’ which should be a zipped XML file.
Using Python, how can I unzip/read this file? Using below, I get "zipfile.BadZipFile: File is not a zip file" (binary value purposely truncated):
import zipfile
import io
# original_zip_data = b"504B030414..."
# filebytes = io.BytesIO(original_zip_data)
original_zip_data = "504B030414..."
filebytes = io.StringIO(original_zip_data)
myzipfile = zipfile.ZipFile(filebytes)
Below linked answer was helpful when binary is in the following format: ‘PK\x03\x04…’
Is there a way to convert my binary to a similar hex format, or is my binary value truly a corrupted/"BadZipFile"?
Unzip buffer with Python?
>Solution :
The problem is you need to convert the hexadecimal string to bytes. io.StringIO class is to be used with text data not binary data.
Here is the correct way to go about it:
import zipfile
import io
#Hex string/ Zip file
original_zip_data = "504B030414..."
# Convert to bytes
filebytes = io.BytesIO(bytes.fromhex(original_zip_data))
# Read the file and print
with zipfile.ZipFile(filebytes, 'r') as myzipfile:
file_content = myzipfile.read('fileinside.txt')
print(file_content.decode('utf-8'))
In this example we convert the hex string to bytes, create a file-like object, and then read the file inside the zip file.