Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Python – Issues with Unicode String from API Call

I’m using Python to call an API that returns the last name of some soccer players. One of the players has a "ć" in his name.

When I call the endpoint, the name prints out with the unicode attached to it:

>>> last_name = (json.dumps(response["response"][2]["player"]["lastname"]))

>>> print(last_name)

"Mitrovi\u0107"

>>> print(type(last_name))

<class 'str'>

If I were to take copy and paste that output and put it in a variable on its own like so:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

>>> print("Mitrovi\u0107")

Mitrović

>>> print(type("Mitrovi\u0107"))

<class 'str'>

Then it prints just fine?

What is wrong with the API endpoint call and the string that comes from it?

>Solution :

Well, you serialise the string with json.dumps() before printing it, that’s why you get a different output.
Compare the following:

>>> print("Mitrović")
Mitrović

and

>>> print(json.dumps("Mitrović"))
"Mitrovi\u0107"

The second command adds double quotes to the output and escapes non-ASCII chars, because that’s how strings are encoded in JSON. So it’s possible that response["response"][2]["player"]["lastname"] contains exactly what you want, but maybe you fooled yourself by wrapping it in json.dumps() before printing.

Note: don’t confuse Python string literals and JSON serialisation of strings. They share some common features, but they aren’t the same (eg. JSON strings can’t be single-quoted), and they serve a different purpose (the first are for writing strings in source code, the second are for encoding data for sending it accross the network).

Another note: You can avoid most of the escaping with ensure_ascii=False in the json.dumps() call:

>>> print(json.dumps("Mitrović", ensure_ascii=False))
"Mitrović"
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading