Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Reading large file in bytes by chunks with dynamic buffer size

I’m trying to read a large file by chunks and save them in an ArrayList of bytes.

My code, in short, looks like this:

public ArrayList<byte[]> packets = new ArrayList<>();
FileInputStream fis = new FileInputStream("random_text.txt");
byte[] buffer = new byte[512];
while (fis.read(buffer) > 0){
  packets.add(buffer);
}
fis.close();

But it has a behavior that I don’t know how to solve, for example: If a file has only the words "hello world", this chunk does not necessarily need to be 512 bytes long. In fact, I want each chunk to be a maximum of 512 bytes not that they all necessarily have that size.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

>Solution :

First of all, what you are doing is probably a bad idea. Storing a file’s contents in memory like this is liable to be a waste of heap space … and can lead to OutOfMemoryError exceptions and / or a requirement for an excessively large heap if you process large (enough) input files.

The second problem is that your code is wrong. You are repeatedly reading the data into the same byte array. Each time you do, it overwrites what was there before. So you will end up will a list containing lots of reference to a single byte array … containing just the last chunk of data that you read.

To solve the problem that you asked about, you will need to copy the chunk that you read to a new (smaller) byte array.

Something like this:

public ArrayList<byte[]> packets = new ArrayList<>();
try (FileInputStream fis = new FileInputStream("random_text.txt")) {
    byte[] buffer = new byte[512];
    int len;
    while ((len = fis.read(buffer)) > 0) {
        packets.add(Arrays.copyOf(buffer, len));
    }
}

Note that this also deals with the second problem I mentioned. And fixes a potential resource leak by using try with resource syntax to manage the closure of the input stream.


A final issue: If this is really a text file that you are reading, you probably should be using a Reader to read it, and char[] or String to hold it.

But even if you do that there are some awkward edge cases if your text contains Unicode codepoints that are not in code plane 0. For example, emojis. The edge cases will involve code points that are represented as a surrogate pair AND the pair being split on a chunk boundary. Reading and storing the text as lines would avoid that.

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading