Encoding issue with requests getting json

March 16, 2024

I want to get topography for French departments with this code :

import pandas as pd
import requests

link_dep = 'https://code.highcharts.com/mapdata/countries/fr/fr-all-all.topo.json'
topo = requests.get(link_dep).json()
x = topo['objects']['default']['geometries']
xx = [y for y in x if y['type'] in ['Polygon', 'MultiPolygon']]
df = pd.json_normalize(xx)

but, for df.loc[26, 'properties.name'], I get ‘Deux-Sčvres’ instead of ‘Deux-Sèvres’. This issue does not appear for ‘ô’ or ‘é’

I understand it’s an encoding issue but I do not know how I can slightly modify my code to get correct encoding at the first step ?

>Solution :

The JSON response contains the Unicode character \u010d which prints as č. For è it would have to be \u00e8. This is not an encoding issue per se. The data’s just wrong.

You can replace \u010d with \u00e8 in the "name" value as follows:

import requests

T = str.maketrans({"\u010d": "\u00e8"})

URL = "https://code.highcharts.com/mapdata/countries/fr/fr-all-all.topo.json"

with requests.get(URL) as response:
    response.raise_for_status()
    data = response.json()
    for g in data['objects']['default']['geometries']:
        p = g["properties"]
        if (name := p.get("name")) is not None:
            p["name"] = name.translate(T)
            print(p["name"])