I am unzipping a very large CSV file in a memory-constrained environment. For this reason I unzip a line at a time like so:
with zipfile.ZipFile(temp_file.name) as zip_content:
filename = zip_content.namelist()[0]
with zip_content.open(filename, mode="r") as content:
for line in content:
print(line)
Which correctly yields each line as a byte array.
b'name,age,city\n'
b'John,12,Madrid\n'
...
I’d like to process these lines with a csv.DictReader
so I can reliably access each field.
However, clearly I cannot create a new dict reader inside the loop for each line.
I’m tempted to just roll my own solution parsing the headers and then creating these dictionaries for each line, but I wonder if there is some quick way to leverage DictReader
.
What is a way of accomplishing this while avoiding reading the entire file into memory first?
>Solution :
You can use io.TextIOWrapper to wrap the open zip entry then use csv.DictReader
normally.
from zipfile import ZipFile
import csv
from io import TextIOWrapper
file_name = "test.zip"
with ZipFile(file_name) as zipObj:
for info in zipObj.infolist():
with zipObj.open(info.filename, "r") as zd:
reader = csv.DictReader(TextIOWrapper(zd, "utf-8"))
for row in reader:
print(row)
For the sample CSV input, the output would be:
{'name': 'John', 'age': '12', 'city': 'Madrid'}