Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Neither ASCII or UTF8 can encode French characters, what should I do?

I have the following function:

private void ReceivedData(byte[] data)
{
    string info = Encoding.ASCII.GetString(data);

When I use this, then the data, containing an é character, replace that character by a question mark (?).

For your information, the data looks as follows in Visual Studio’s Watch window (the mentioned character is found back in data[27] and data[28]):

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

enter image description here

For your information: when I type ALT+0233 on my computer, I see the mentioned é character.

When I replace ASCII encoding by UTF8 encoding (as suggested on some websites or some answers here on the site), I get some weird characters, containing question marks (��, or in an image enter image description here):

private void ReceivedData(byte[] data)
{
    string info = Encoding.UTF8.GetString(data);

Which encoding should I use for correctly decode French characters?

Thanks in advance

>Solution :

Looks like a Win-1252 encoding (which is for various Latin characters with diacritics),

// In case you work with .Net Core you have to enable code pages (1252)
Encoding.RegisterProvider(CodePagesEncodingProvider.Instance);

byte[] data = {
  95, 233, 233, 110
};

var result = Encoding.GetEncoding(1252).GetString(data);

Console.Write(result);

Output:

_één

Edit: In general case, when facing unknown encoding you can try quering all the encodings available and inspect the results:

using System.Linq;
using System.Text;

...

// Enable code pages for .net core
Encoding.RegisterProvider(CodePagesEncodingProvider.Instance);

byte[] data = {
  95, 233, 233, 110
};

var report = string.Join(Environment.NewLine, Encoding
  .GetEncodings()
  .OrderBy(encoder => encoder.Name, StringComparer.OrdinalIgnoreCase)
  .Select(encoder => (name: encoder.Name, text: encoder.GetEncoding().GetString(data)))
  .Where(pair => pair.text.Contains('é')) // at least one é must be present
  .Select(pair => $"{pair.name,-30} : {pair.text}"));

Console.Write(report);

Output:

iso-8859-1                     : _één
iso-8859-13                    : _één
iso-8859-15                    : _één
iso-8859-2                     : _één
iso-8859-3                     : _één
iso-8859-4                     : _één
iso-8859-9                     : _één
windows-1250                   : _één
windows-1252                   : _één <- The most probabale (IMHO) encoding
windows-1254                   : _één
windows-1256                   : _één
windows-1257                   : _één
windows-1258                   : _één
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading