Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

C# UTF8 Encoding/Decoding issue

I have to read a bad encoded string from a remote service and can not figure out how to recover the correct value in C# or Javascript. I can neither change the values in the service or change the way they are being saved in the DB, but I need to display them correctly.

Bad string: Adrián José
Correct string: Adrián José

The error can be undone since the fixed value can be obtained using tools such as https://www.iosart.com/tools/charset-fixer or in Notepad++ by changing the Encoding from ANSI to UTF-8.

So far, I have this solution in JS (client side), but I don’t like to use the escape() function and would like to do the fix on server side.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

var badString = "Adrián José";
var fixedString = decodeURIComponent(escape(badString)); // "Adrián José"

I tried to play with the Encoding class in C# (like here), but couln’t find a valid combination.

var badString = "Adrián José";
var origEnco = Encoding.UTF8;
var targetEnco = Encoding.Default;
byte[] utfBytes = origEnco.GetBytes(badString);
byte[] isoBytes = Encoding.Convert(origEnco, targetEnco, utfBytes);
string fixedString = targetEnco.GetString(isoBytes); // "Adrián José"

What am I missing? How do the character set fixer or Notepad++ work?

>Solution :

For your provided example, this code works and outputs "Adrián José" as expected:

var currentEncoding = Encoding.GetEncoding("Windows-1252");
var targetEncoding = Encoding.UTF8;
string input = "Adrián José";
string output = targetEncoding.GetString(currentEncoding.GetBytes(input));

If you’re using .NET Core/.NET 5+ then you’ll need to install System.Text.Encoding.CodePages from NuGet and add this somewhere in your code (I usually do it at the top of my Main method):

Encoding.RegisterProvider(CodePagesEncodingProvider.Instance);

While this provides the result you’re interested in, I don’t know if it will work for all instances of your bad text.

If you can, I would fix the problem at the source, rather than trying to fix it once you have the incorrectly-encoded string.

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading