Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

How to convert utf-8 characters to "normal" characters in string in python3.10?

I have raw data that looks like this:

25023,Zwerg+M%C3%BCtze,0,1,986,3780
25871,red+earth,0,1,38,8349
25931,K4m%21k4z3,90,1,1539,2530

It is saved as a .txt file: https://de205.die-staemme.de/map/player.txt

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

The "characters" starting with % are unicode, as far as I can tell.

I found the following table about it: https://www.i18nqa.com/debug/utf8-debug.html

Here is my code so far:

urllib.urlretrieve(url,pfad + "player.txt")

f = open(pfad + "player.txt","r",encoding="utf-8")
raw = raw.split("\n")
f.close()

Python does not convert the %-characters. They are read as if they were seperate characters.

Is there a way to convert these characters without calling .replace like 200 times?

Thank you very much in advance for help and/or useful hints!

>Solution :

The %s are URL-encoding; use urllib.parse.unquote to decode the string.

>>> raw = """25023,Zwerg+M%C3%BCtze,0,1,986,3780
... 25871,red+earth,0,1,38,8349
... 25931,K4m%21k4z3,90,1,1539,2530"""
>>> import urllib.parse
>>> print(urllib.parse.unquote(raw))
25023,Zwerg+Mütze,0,1,986,3780
25871,red+earth,0,1,38,8349
25931,K4m!k4z3,90,1,1539,2530
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading