I’m using Python to call an API that returns the last name of some soccer players. One of the players has a "ć" in his name.
When I call the endpoint, the name prints out with the unicode attached to it:
>>> last_name = (json.dumps(response["response"]["player"]["lastname"])) >>> print(last_name) "Mitrovi\u0107" >>> print(type(last_name)) <class 'str'>
If I were to take copy and paste that output and put it in a variable on its own like so:
>>> print("Mitrovi\u0107") Mitrović >>> print(type("Mitrovi\u0107")) <class 'str'>
Then it prints just fine?
What is wrong with the API endpoint call and the string that comes from it?
Well, you serialise the string with
json.dumps() before printing it, that’s why you get a different output.
Compare the following:
>>> print("Mitrović") Mitrović
>>> print(json.dumps("Mitrović")) "Mitrovi\u0107"
The second command adds double quotes to the output and escapes non-ASCII chars, because that’s how strings are encoded in JSON. So it’s possible that
response["response"]["player"]["lastname"] contains exactly what you want, but maybe you fooled yourself by wrapping it in
json.dumps() before printing.
Note: don’t confuse Python string literals and JSON serialisation of strings. They share some common features, but they aren’t the same (eg. JSON strings can’t be single-quoted), and they serve a different purpose (the first are for writing strings in source code, the second are for encoding data for sending it accross the network).
Another note: You can avoid most of the escaping with
ensure_ascii=False in the
>>> print(json.dumps("Mitrović", ensure_ascii=False)) "Mitrović"