Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

How to prevent filling up memory when hashing large files with xxhash?

I’m trying to calculate xxhash of video files using the following code:

def get_hash(file):
    with open(file, 'rb') as input_file:
        return xxhash.xxh3_64(input_file.read()).hexdigest()

Some of the files are larger than the amount of RAM on the machine. When hashing those files, the memory fills up, followed by swap filling up, at which point the process gets killed by the OS (I assume).
What is the correct way to handle these situations? Thank you!

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

>Solution :

Instead of hashing the entire contents in one go, read it in chunks and update the hash as you read. Once you’ve used a chunk, you can discard it.

from functools import partial

def get_hash(file):
    CHUNK_SIZE = 2 ** 32  # Or whatever you have memory to handle
    with open(file, 'rb') as input_file:
        x = xxhash.xxh3_64()
        for chunk in iter(partial(input_file.read, CHUNK_SIZE), b''):
            x.update(chunk)
        return x.hexdigest()
 
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading