Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Unzip file from binary value in Python

I have a binary value beginning with ‘504B030414…’ which should be a zipped XML file.

Using Python, how can I unzip/read this file? Using below, I get "zipfile.BadZipFile: File is not a zip file" (binary value purposely truncated):

import zipfile
import io
# original_zip_data = b"504B030414..."
# filebytes = io.BytesIO(original_zip_data)
original_zip_data = "504B030414..."
filebytes = io.StringIO(original_zip_data)
myzipfile = zipfile.ZipFile(filebytes)

Below linked answer was helpful when binary is in the following format: ‘PK\x03\x04…’

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

Is there a way to convert my binary to a similar hex format, or is my binary value truly a corrupted/"BadZipFile"?
Unzip buffer with Python?

>Solution :

The problem is you need to convert the hexadecimal string to bytes. io.StringIO class is to be used with text data not binary data.

Here is the correct way to go about it:

import zipfile
import io

#Hex string/ Zip file
original_zip_data = "504B030414..."


# Convert to bytes
filebytes = io.BytesIO(bytes.fromhex(original_zip_data))

# Read the file and print
with zipfile.ZipFile(filebytes, 'r') as myzipfile:
    file_content = myzipfile.read('fileinside.txt') 
    print(file_content.decode('utf-8'))

In this example we convert the hex string to bytes, create a file-like object, and then read the file inside the zip file.

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading