Context: Iโm trying to convert characters like these:
๐๐๐๐๐๐๐๐
๐๐ฅ๐๐๐๐ฎ
๐๐๐ผ๐๐ป๐ธ๐
๐ค๐ก๐๐๐๐๐ฅ๐ฅ๐
To normal python strings (speedy, building, tuesday, etc) and save them into a new dataframe to be exported into a new excel file. For example, the charcter ๐ (U+1D552) should be converted to a (U+00AA). Iโm reading each string from an excel file using read_excel
. Should I do some type of encoding = "utf-8"? on the read_excel function? Or is there a way using re
to replace those characters? Or even encode("ascii").decode(utf-8)?
Thank you in advance
>Solution :
Using unicodedata
you can normalize unicode strings:
>> from unicodedata import normalize
>> test_str = "๐๐๐๐๐๐๐๐ ๐๐ฅ๐๐๐๐ฎ ๐๐๐ผ๐๐ป๐ธ๐ ๐ค๐ก๐๐๐๐๐ฅ๐ฅ๐"
>> print(normalize('NFKC', test_str))
BUILDING Speedy TUESDAY spaghetti