I am trying to create an iterator for reading chunks of size chunk_size from a file. I have the following solution:
fn read_chunks<R: std::io::Read>(file: &mut R, chunk_size: usize) {
let file_bytes: Vec<u8> = file.bytes().flatten().collect();
let chunk_iter = file_bytes.chunks(chunk_size);
// rest of code omitted ...
}
It is however too slow for large files. I notice the following is much faster:
fn read_chunks<R: std::io::Read>(file: &mut R, chunk_size: usize) {
loop {
let mut buf = vec![0u8; chunk_size]
match file.read_exact(&mut buf) {
Ok(()) => {
// code for handling each chunk omitted
}
Err(e) if e.kind() == std::io::error::ErrorKind::UnexpectedEoF => {
break;
}
}
}
}
But how do I return it as an iterator? I want to have it like chunk_iter where I can use it in conjunction with other iterators.
intuitively I want to write something like this (similar to a generator in python):
fn read_chunks_iter<'a, R: std::io::Read>(file &mut R, chunk_size: usize)
-> Iterator<Item=&'a [u8]>
{
loop {
let mut buf = vec![0u8; chunk_size]
match file.read_exact(&mut buf) {
Ok(()) => {
yield buf;
}
Err(e) if e.kind() == std::io::error::ErrorKind::UnexpectedEoF => {
break;
}
}
}
}
>Solution :
A simple iterator like this isn’t too difficult since each iteration of the loop produces a single item. We just need to make a type for it and stick the contents of the loop into Iterator::next.
use std::io::{self, Read, ErrorKind};
pub struct ToChunks<R> {
reader: R,
chunk_size: usize,
}
impl<R: Read> Iterator for ToChunks<R> {
type Item = io::Result<Vec<u8>>;
fn next(&mut self) -> Option<Self::Item> {
let mut buffer = vec![0u8; self.chunk_size];
match self.reader.read_exact(&mut buffer) {
Ok(()) => return Some(Ok(buffer)),
Err(e) if e.kind() == ErrorKind::UnexpectedEof => None,
Err(e) => Some(Err(e)),
}
}
}
After that we can put together a simple trait to make it easier to call and we are ready to go.
pub trait IterChunks {
type Output;
fn iter_chunks(self, len: usize) -> Self::Output;
}
impl<R: Read> IterChunks for R {
type Output = ToChunks<R>;
fn iter_chunks(self, len: usize) -> Self::Output {
ToChunks {
reader: self,
chunk_size: len,
}
}
}
Now it should be as simple as calling iter_chunks(len).
let file = BufReader::new(File::open("large_file.txt")?);
for chunk in file.iter_chunks(1024) {
println!("{:?}", chunk);
}