I want understand .Net 6 Core StreamReader properties

September 22, 2022

I need to calculate MD5 for a file.

private string GetMD5(string file)
{
  using var md5 = MD5.Create();
  using var stream = new StreamReader(file);
  return (BitConverter.ToString(md5.ComputeHash(stream.BaseStream)).Replace("-", string.Empty)).ToLower();
}

private string GetMD5_V2(string file)
{
  using var md5 = MD5.Create();
  using var stream = new StreamReader(file);
  **_ = stream.EndOfStream;**
  return (BitConverter.ToString(md5.ComputeHash(stream.BaseStream)).Replace("-", string.Empty)).ToLower();
}

test()
{
  var fichier = "myFile.txt";
  var md5_1 = GetMD5(fichier);
  var md5_2 = GetMD5_V2(fichier);
}

When I run this code md5_1 and md5_2 is different. I not understand why when I read the propertie stream.EndOfStream this change the result of stream.BaseStream?

>Solution :

Querying the EndOfStream property of a freshly initialized StreamReader reads some bytes from the underlying stream. See the source code of this property’s getter (link):

public bool EndOfStream
{
    get
    {
        ThrowIfDisposed();
        CheckAsyncTaskInProgress();
 
        if (_charPos < _charLen)
        {
            return false;
        }
 
        // This may block on pipes!
        int numRead = ReadBuffer();
        return numRead == 0;
    }
}

On a freshly instantiated StreamReader, the value of both _charPos and _charLen is zero, leading to the EndOfStream getter invoking ReadBuffer().

ReadBuffer() reading from the underlying stream will then advance the read/write position of that stream, hence the MD5 instance then consuming only the remaining bytes from the stream beginning from the now advanced stream read/write position. Which then in turn yields a different MD5 hash compared to calculating the MD5 hash over the entire stream.