Neither ASCII or UTF8 can encode French characters, what should I do?

Advertisements

I have the following function:

private void ReceivedData(byte[] data)
{
    string info = Encoding.ASCII.GetString(data);

When I use this, then the data, containing an é character, replace that character by a question mark (?).

For your information, the data looks as follows in Visual Studio’s Watch window (the mentioned character is found back in data[27] and data[28]):

For your information: when I type ALT+0233 on my computer, I see the mentioned é character.

When I replace ASCII encoding by UTF8 encoding (as suggested on some websites or some answers here on the site), I get some weird characters, containing question marks (��, or in an image ):

private void ReceivedData(byte[] data)
{
    string info = Encoding.UTF8.GetString(data);

Which encoding should I use for correctly decode French characters?

Thanks in advance

>Solution :

Looks like a Win-1252 encoding (which is for various Latin characters with diacritics),

// In case you work with .Net Core you have to enable code pages (1252)
Encoding.RegisterProvider(CodePagesEncodingProvider.Instance);

byte[] data = {
  95, 233, 233, 110
};

var result = Encoding.GetEncoding(1252).GetString(data);

Console.Write(result);

Output:

_één

Edit: In general case, when facing unknown encoding you can try quering all the encodings available and inspect the results:

using System.Linq;
using System.Text;

...

// Enable code pages for .net core
Encoding.RegisterProvider(CodePagesEncodingProvider.Instance);

byte[] data = {
  95, 233, 233, 110
};

var report = string.Join(Environment.NewLine, Encoding
  .GetEncodings()
  .OrderBy(encoder => encoder.Name, StringComparer.OrdinalIgnoreCase)
  .Select(encoder => (name: encoder.Name, text: encoder.GetEncoding().GetString(data)))
  .Where(pair => pair.text.Contains('é')) // at least one é must be present
  .Select(pair => $"{pair.name,-30} : {pair.text}"));

Console.Write(report);

Output:

iso-8859-1                     : _één
iso-8859-13                    : _één
iso-8859-15                    : _één
iso-8859-2                     : _één
iso-8859-3                     : _één
iso-8859-4                     : _één
iso-8859-9                     : _één
windows-1250                   : _één
windows-1252                   : _één <- The most probabale (IMHO) encoding
windows-1254                   : _één
windows-1256                   : _één
windows-1257                   : _één
windows-1258                   : _één

Leave a ReplyCancel reply