Code referenced in the StackOverflow question titled "Calclate MD5 checksum for a file" provides a simple way to get a checksum based on a file. This works, however through testing I have found that changing file metadata also causes the checksum to change. And I guess this makes sense, well, because a copy with different metadata is technically a different file. That code:
using (var md5 = MD5.Create())
{
using (var stream = File.OpenRead(filename))
{
return md5.ComputeHash(stream);
}
}
As images pass through various systems here at work, image metadata is added and changed for various reasons, which means a file checksum cannot be used to find duplicates among our systems.
What I need is a way to generate a checksum based on an image itself, not an image file.
My attempt to solve this problem has resulted in this code:
using (var md5 = MD5.Create())
{
using (var stream = new MemoryStream())
{
using (Image image = Image.FromFile(fileName))
{
image.Save(stream, image.RawFormat);
var hash = md5.ComputeHash(stream);
var convertedHash = BitConverter.ToString(hash).Replace("-", String.Empty).ToLowerInvariant();
return convertedHash;
}
}
}
This seems very straightforward to me and the code runs without error, however no matter what image I feed into it, I get the same checksum, so something is wrong. I just can’t seem to determine why this behavior is occurring. Any input or knowledge is greatly appreciated. Why does this generate the same checksum for any image? What am I doing wrong or missing?
(Edit: to be clear, I know that the image data itself must be exactly the same to generate the same checksum; this is what I need. i.e., I am not looking to find similar, or very similar images, etc.)
>Solution :
Your issue is caused by the fact that after writing to the stream, you do not reset the stream’s position before reading it, so you get the same hash every single time.
You can easily verify this by passing in a new stream:
var hash = md5.ComputeHash(new MemoryStream());
var convertedHash = BitConverter.ToString(hash).Replace("-", String.Empty).ToLowerInvariant();
return convertedHash;
which for me returns d41d8cd98f00b204e9800998ecf8427e
and is identical to what I get when loading files without rewinding the stream.
If you reset the position, you’ll get different hashes per file as expected:
public string GetHashFromImage(string fileName)
{
using (var md5 = MD5.Create())
{
using (var stream = new MemoryStream())
{
using (Image image = Image.FromFile(fileName))
{
image.Save(stream, image.RawFormat);
stream.Position = 0;
var hash = md5.ComputeHash(stream);
var convertedHash = BitConverter.ToString(hash).Replace("-", String.Empty).ToLowerInvariant();
return convertedHash;
}
}
}
}