I am trying to reproduce the (ABC) example from this site:
https://opensource.adobe.com/dc-acrobat-sdk-docs/acrobatsdk/html2015/index.html#t=Acro12_MasterBook%2Fpdfmark_Basic%2FBookmarks_OUT.htm
For example, the Unicode string for (ABC) is <FEFF004100420043>.
But when I try to reproduce just the ABC, I get:
"ABC".encode(encoding="utf-16be")
Out[29]: b'\x00A\x00B\x00C'
I think I am misunderstanding a larger concept, but I am unsure what to look for.
I need to produce the exact same string, so for the minimal example above I would need: 004100420043. The question therefore is: How do I get from one representation to the other?
Given the already existing answer by gog:
How do I get from b'\xFE\xFF\x00\x41\x00\x42\x00\x43' to FEFF004100420043
>Solution :
Look like they want BOM as well, so
import codecs
result = codecs.BOM_UTF16_BE + "ABC".encode(encoding="utf-16be")
which would be
b'\xfe\xff\x00A\x00B\x00C'
which is the same as
b'\xFE\xFF\x00\x41\x00\x42\x00\x43'
To convert that to the hex format, use
result.hex()
optionally followed by .upper()