How to get csvreader's DictReader to process one line at a time?

Advertisements

I am unzipping a very large CSV file in a memory-constrained environment. For this reason I unzip a line at a time like so:

with zipfile.ZipFile(temp_file.name) as zip_content:
    filename = zip_content.namelist()[0]
    with zip_content.open(filename, mode="r") as content:
        for line in content:
            print(line)

Which correctly yields each line as a byte array.

b'name,age,city\n'
b'John,12,Madrid\n'
...

I’d like to process these lines with a csv.DictReader so I can reliably access each field.

However, clearly I cannot create a new dict reader inside the loop for each line.

I’m tempted to just roll my own solution parsing the headers and then creating these dictionaries for each line, but I wonder if there is some quick way to leverage DictReader.

What is a way of accomplishing this while avoiding reading the entire file into memory first?

>Solution :

You can use io.TextIOWrapper to wrap the open zip entry then use csv.DictReader normally.

from zipfile import ZipFile
import csv
from io import TextIOWrapper

file_name = "test.zip"
with ZipFile(file_name) as zipObj:
    for info in zipObj.infolist():
        with zipObj.open(info.filename, "r") as zd:
         reader = csv.DictReader(TextIOWrapper(zd, "utf-8"))
            for row in reader:
                print(row)

For the sample CSV input, the output would be:

{'name': 'John', 'age': '12', 'city': 'Madrid'}

Leave a ReplyCancel reply