Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Python: How to convert string of an encoded Unicode variable to a binary variable

I am building an app to transliterate Myanmar (Burmese) text to International Phonetic Alphabet. I have found that it’s easier to manipulate combining Unicode characters in my dictionaries as binary variables like this

b'\xe1\x80\xad\xe1\x80\xaf\xe1\x80\x84': 'áɪŋ̃',    #'ိုင'

because otherwise they glue to neighboring characters like apostrophes and I can’t get them unstuck.
I am translating UTF-8 Burmese characters to binary hex variables to build my dictionaries. Sometimes I want to convert backwards. I wrote a simple program:

while True:
    binary = input("Enter binary Unicode string or 'exit' to exit: ")
    if binary == 'exit':
        break
    else:
        print(binary.decode('utf-8'))

Obviously this will not work for an input such as b'\xe1\x80\xad', since input() returns a string.
I would like to know the quickest way to convert the string "b'\xe1\x80\xad'" to it’s Unicode form ‘ ိ ‘. My only idea is

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

binary = bin(int(binary, 16))

but that returns

ValueError: invalid literal for int() with base 16: "b'\\xe1\\x80\\xad\\'"

Please help! Thanks.

>Solution :

The issue is that the input string "b’\xe1\x80\xad’" is not a valid hexadecimal string that can be converted to Unicode directly. However, you can use the ast module to safely evaluate the string as a Python literal and get the corresponding bytes object. Here’s an example:

import ast

binary = "b'\\xe1\\x80\\xad'"
bytes_obj = ast.literal_eval(binary)
unicode_str = bytes_obj.decode('utf-8')
print(unicode_str)

This should output the Unicode character ‘ိ’. The ast.literal_eval() function safely evaluates the input string as a Python literal, which in this case is a bytes object. You can then decode the bytes object to get the corresponding Unicode string.

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading