How to import a subset of a zip file into colab?

I have a very big zip file in my google drive which contain several subfloders. Now, I’d like to extract only a few subfolders (not all folder into colab). Is there any way for this?

For instance, suppose the zip file name is "MyBigFile.zip" which contain "folder1", "folder2", "folder3", "folder4", and "folder5". I only want to import and extract "folder1",and "folder4" into my google colab (and better import only 200 images from it only). How is it possible? any suggestion?

*if this is related: each folder 1-5 contains around 50000 .png files

>Solution :

After some searching I found something. You can use the zipfile module in google collab too.


from zipfile import ZipFile
from google.colab import drive

drive.mount('/content/drive/')

zipfile = ZipFile("quote.zip")
def extract(folderName, numberOfFiles):
    files = list(filter(lambda x: x.startswith(folderName), zipfile.namelist()))[:numberOfFiles]
    for file in files:
        zipfile.extract(file, 'extractedFolder')

extract("folder1/", 200)
extract("folder4/", 100)
zipfile.close()

Leave a Reply