Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Should I delete blank values in utf-16 encoding?

When I read all the bytes from a string using Encoding.Unicode, It gives me blank (0) values.

When I run this code:

byte[] value = Encoding.Unicode.GetBytes("Hi");

It gives me the output

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

72
0
105
0

I know this is because UTF-16 stores 2 bytes and the 0 is just the second byte, but my question is should i delete the 0’s? since as far as I know, they do not do anything and my program requires to loop through the array so the 0’s would only make it slower.

>Solution :

No, you shouldn’t delete bytes from a text encoding, because then you end up with garbage that can no longer be considered a valid encoding of a text.

If you have many ASCII characters and a few non-ASCII characters, you are probably better off with the UTF-8 encoding instead of UTF-16.

UTF-8 encodes to a single byte for ASCII chars and uses 2-4 bytes for non-ASCII chars.

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading