Understanding unicode in python

Advertisements

I have script like that:

#!/usr/bin/python3
# -*- coding: utf-8 -*-
import re
var1 = '1F1EB 1F1F7'
var1 = re.sub(r' ', r'\\U000', var1)
var1 = r'\U000' + var1
var2 = '\U0001F1EB\U0001F1F7'
if var1 == var2:
    print('true')
print(type(var1))
print(var1)
print(type(var2))
print(var2)

Output:
<class 'str'>
\U0001F1EB\U0001F1F7
<class 'str'>
🇫🇷

Input variable is var1, but I need to make it equal to var2. var1 never equal var2 in this case. What do i need to make them equal?

Expecting
Output:

<class 'str'>
🇫🇷
<class 'str'>
🇫🇷

>Solution :

\U0001F1EB in a string literal is being interpreted and replaced by the appropriate Unicode character as part of the parsing process. I.e. writing '\U0001F1EB\U0001F1F7' and writing '🇫🇷' are exactly the same thing, just alternative ways to express it. Programmatically constructing a string which contains the text "\U0001F1EB" isn’t the same thing, it’s just the text "\U0001F1EB".

What you want is to turn the hex number 1F1EB into the Unicode character with that codepoint. You can do this by turning the number into a decimal and passing it to chr:

chr(int('1F1EB', 16))

Doing this for both numbers creates the correct Unicode sequence for the French flag:

>>> chr(int('1F1EB', 16)) + chr(int('1F1F7', 16))
'🇫🇷'

So, more programmatically:

>>> var1 = '1F1EB 1F1F7'
>>> ''.join(chr(int(i, 16)) for i in var1.split())
'🇫🇷'

Leave a ReplyCancel reply