Convert "weird" strings to normal python strings

Advertisements

Context: Iโ€™m trying to convert characters like these:

๐๐”๐ˆ๐‹๐ƒ๐ˆ๐๐†
๐™Ž๐™ฅ๐™š๐™š๐™™๐™ฎ
๐•‹๐•Œ๐”ผ๐•Š๐”ป๐”ธ๐•
๐•ค๐•ก๐•’๐•˜๐•™๐•–๐•ฅ๐•ฅ๐•š

To normal python strings (speedy, building, tuesday, etc) and save them into a new dataframe to be exported into a new excel file. For example, the charcter ๐•’ (U+1D552) should be converted to a (U+00AA). Iโ€™m reading each string from an excel file using read_excel. Should I do some type of encoding = "utf-8"? on the read_excel function? Or is there a way using re to replace those characters? Or even encode("ascii").decode(utf-8)?

Thank you in advance

>Solution :

Using unicodedata you can normalize unicode strings:

>> from unicodedata import normalize
>> test_str = "๐๐”๐ˆ๐‹๐ƒ๐ˆ๐๐† ๐™Ž๐™ฅ๐™š๐™š๐™™๐™ฎ ๐•‹๐•Œ๐”ผ๐•Š๐”ป๐”ธ๐• ๐•ค๐•ก๐•’๐•˜๐•™๐•–๐•ฅ๐•ฅ๐•š"
>> print(normalize('NFKC', test_str))
BUILDING Speedy TUESDAY spaghetti

Leave a ReplyCancel reply