Using python3.8 I would like to convert unicode notation to python notation:
s = 'U+00A0'
result = s.lower() # output 'u+00a0'
I want to replace u+ with \u:
result = s.lower().replace('u+','\u')
But I get the error:
SyntaxError: (unicode error) 'unicodeescape' codec can't decode bytes in position 0-1: truncated \uXXXX escape
How can I convert the notation U+00A0 to \u00a0 ?
EDIT:
The reason I wanted to get \u00a0 is to further use encode method to get b'\xc2\xa0'.
My question: given a string in the following notation U+00A0 I would like to convert it to byte code b'\xc2\xa0'
>Solution :
you are struggling with the representation of something versus its value…
import re
re.sub("u\+([0-9a-f]{4})",lambda m:chr(int(m.group(1),16)),s)
but for u+00a0 this becomes \xa0
but same with the literal \u00a0
s = "\u00a0"
print(repr(s))
once you have the proper value as a unicode string you can then encode it to utf8
s = "\xa0"
print(s.encode('utf8'))
# b'\xc2\xa0'
so just final answer here
import re
s = "u+00a0"
s2 = re.sub("u\+([0-9a-f]{4})",lambda m:chr(int(m.group(1),16)),s)
s_bytes = s2.encode('utf8') # b'\xc2\xa0'