Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

How do I encode text to utf-16be "correctly"?

I am trying to reproduce the (ABC) example from this site:
https://opensource.adobe.com/dc-acrobat-sdk-docs/acrobatsdk/html2015/index.html#t=Acro12_MasterBook%2Fpdfmark_Basic%2FBookmarks_OUT.htm

For example, the Unicode string for (ABC) is <FEFF004100420043>.

But when I try to reproduce just the ABC, I get:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

"ABC".encode(encoding="utf-16be")
Out[29]: b'\x00A\x00B\x00C'

I think I am misunderstanding a larger concept, but I am unsure what to look for.

I need to produce the exact same string, so for the minimal example above I would need: 004100420043. The question therefore is: How do I get from one representation to the other?

Given the already existing answer by gog:
How do I get from b'\xFE\xFF\x00\x41\x00\x42\x00\x43' to FEFF004100420043

>Solution :

Look like they want BOM as well, so

import codecs
result = codecs.BOM_UTF16_BE + "ABC".encode(encoding="utf-16be")

which would be

b'\xfe\xff\x00A\x00B\x00C' 

which is the same as

b'\xFE\xFF\x00\x41\x00\x42\x00\x43'

To convert that to the hex format, use

result.hex()

optionally followed by .upper()

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading