Iterator for reading file chunks

I am trying to create an iterator for reading chunks of size chunk_size from a file. I have the following solution:

fn read_chunks<R: std::io::Read>(file: &mut R, chunk_size: usize) {
    let file_bytes: Vec<u8> = file.bytes().flatten().collect();
    let chunk_iter = file_bytes.chunks(chunk_size);
    // rest of code omitted ...
}

It is however too slow for large files. I notice the following is much faster:

fn read_chunks<R: std::io::Read>(file: &mut R, chunk_size: usize) {
    loop {
        let mut buf = vec![0u8; chunk_size]
        match file.read_exact(&mut buf) {
            Ok(()) => {
                // code for handling each chunk omitted
            }
            Err(e) if e.kind() == std::io::error::ErrorKind::UnexpectedEoF => {
                break;
            }
        }
    }
}

But how do I return it as an iterator? I want to have it like chunk_iter where I can use it in conjunction with other iterators.

intuitively I want to write something like this (similar to a generator in python):

fn read_chunks_iter<'a, R: std::io::Read>(file &mut R, chunk_size: usize)
    -> Iterator<Item=&'a [u8]>
 {
    loop {
        let mut buf = vec![0u8; chunk_size]
        match file.read_exact(&mut buf) {
            Ok(()) => {
                yield buf;
            }
            Err(e) if e.kind() == std::io::error::ErrorKind::UnexpectedEoF => {
                break;
            }
        }
    }
}

>Solution :

A simple iterator like this isn’t too difficult since each iteration of the loop produces a single item. We just need to make a type for it and stick the contents of the loop into Iterator::next.

use std::io::{self, Read, ErrorKind};

pub struct ToChunks<R> {
    reader: R,
    chunk_size: usize,
}

impl<R: Read> Iterator for ToChunks<R> {
    type Item = io::Result<Vec<u8>>;
    
    fn next(&mut self) -> Option<Self::Item> {
        let mut buffer = vec![0u8; self.chunk_size];
        match self.reader.read_exact(&mut buffer) {
            Ok(()) => return Some(Ok(buffer)),
            Err(e) if e.kind() == ErrorKind::UnexpectedEof => None,
            Err(e) => Some(Err(e)),
        }
    }
}

After that we can put together a simple trait to make it easier to call and we are ready to go.


pub trait IterChunks {
    type Output;
    
    fn iter_chunks(self, len: usize) -> Self::Output;
}

impl<R: Read> IterChunks for R {
    type Output = ToChunks<R>;
    
    fn iter_chunks(self, len: usize) -> Self::Output {
        ToChunks {
            reader: self,
            chunk_size: len,
        }
    }
}

Now it should be as simple as calling iter_chunks(len).

let file = BufReader::new(File::open("large_file.txt")?);

for chunk in file.iter_chunks(1024) {
    println!("{:?}", chunk);
}

Leave a Reply