Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Iterator for reading file chunks

I am trying to create an iterator for reading chunks of size chunk_size from a file. I have the following solution:

fn read_chunks<R: std::io::Read>(file: &mut R, chunk_size: usize) {
    let file_bytes: Vec<u8> = file.bytes().flatten().collect();
    let chunk_iter = file_bytes.chunks(chunk_size);
    // rest of code omitted ...
}

It is however too slow for large files. I notice the following is much faster:

fn read_chunks<R: std::io::Read>(file: &mut R, chunk_size: usize) {
    loop {
        let mut buf = vec![0u8; chunk_size]
        match file.read_exact(&mut buf) {
            Ok(()) => {
                // code for handling each chunk omitted
            }
            Err(e) if e.kind() == std::io::error::ErrorKind::UnexpectedEoF => {
                break;
            }
        }
    }
}

But how do I return it as an iterator? I want to have it like chunk_iter where I can use it in conjunction with other iterators.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

intuitively I want to write something like this (similar to a generator in python):

fn read_chunks_iter<'a, R: std::io::Read>(file &mut R, chunk_size: usize)
    -> Iterator<Item=&'a [u8]>
 {
    loop {
        let mut buf = vec![0u8; chunk_size]
        match file.read_exact(&mut buf) {
            Ok(()) => {
                yield buf;
            }
            Err(e) if e.kind() == std::io::error::ErrorKind::UnexpectedEoF => {
                break;
            }
        }
    }
}

>Solution :

A simple iterator like this isn’t too difficult since each iteration of the loop produces a single item. We just need to make a type for it and stick the contents of the loop into Iterator::next.

use std::io::{self, Read, ErrorKind};

pub struct ToChunks<R> {
    reader: R,
    chunk_size: usize,
}

impl<R: Read> Iterator for ToChunks<R> {
    type Item = io::Result<Vec<u8>>;
    
    fn next(&mut self) -> Option<Self::Item> {
        let mut buffer = vec![0u8; self.chunk_size];
        match self.reader.read_exact(&mut buffer) {
            Ok(()) => return Some(Ok(buffer)),
            Err(e) if e.kind() == ErrorKind::UnexpectedEof => None,
            Err(e) => Some(Err(e)),
        }
    }
}

After that we can put together a simple trait to make it easier to call and we are ready to go.


pub trait IterChunks {
    type Output;
    
    fn iter_chunks(self, len: usize) -> Self::Output;
}

impl<R: Read> IterChunks for R {
    type Output = ToChunks<R>;
    
    fn iter_chunks(self, len: usize) -> Self::Output {
        ToChunks {
            reader: self,
            chunk_size: len,
        }
    }
}

Now it should be as simple as calling iter_chunks(len).

let file = BufReader::new(File::open("large_file.txt")?);

for chunk in file.iter_chunks(1024) {
    println!("{:?}", chunk);
}
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading