Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Encoding issue with requests getting json

I want to get topography for French departments with this code :

import pandas as pd
import requests

link_dep = 'https://code.highcharts.com/mapdata/countries/fr/fr-all-all.topo.json'
topo = requests.get(link_dep).json()
x = topo['objects']['default']['geometries']
xx = [y for y in x if y['type'] in ['Polygon', 'MultiPolygon']]
df = pd.json_normalize(xx)

but, for df.loc[26, 'properties.name'], I get ‘Deux-Sčvres’ instead of ‘Deux-Sèvres’. This issue does not appear for ‘ô’ or ‘é’

I understand it’s an encoding issue but I do not know how I can slightly modify my code to get correct encoding at the first step ?

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

>Solution :

The JSON response contains the Unicode character \u010d which prints as č. For è it would have to be \u00e8. This is not an encoding issue per se. The data’s just wrong.

You can replace \u010d with \u00e8 in the "name" value as follows:

import requests

T = str.maketrans({"\u010d": "\u00e8"})

URL = "https://code.highcharts.com/mapdata/countries/fr/fr-all-all.topo.json"

with requests.get(URL) as response:
    response.raise_for_status()
    data = response.json()
    for g in data['objects']['default']['geometries']:
        p = g["properties"]
        if (name := p.get("name")) is not None:
            p["name"] = name.translate(T)
            print(p["name"])
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading