Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Convert nested dictionary to pandas dataframe in python

All, I have the following nested dictionary (from a JSON API response). I would like to access the individual items by means of a pandas dataframe.

The dictionary looks as follows:

{'pagination': {'limit': 2, 'offset': 0, 'count': 2, 'total': 1474969}, 'data': [{'flight_date': '2022-10-12', 'flight_status': 'active', 'departure': {'airport': 'Tullamarine', 'timezone': 'Australia/Melbourne', 'iata': 'MEL', 'icao': 'YMML', 'terminal': '2', 'gate': '16', 'delay': 20, 'scheduled': '2022-10-12T00:50:00+00:00', 'estimated': '2022-10-12T00:50:00+00:00', 'actual': '2022-10-12T01:09:00+00:00', 'estimated_runway': '2022-10-12T01:09:00+00:00', 'actual_runway': '2022-10-12T01:09:00+00:00'}, 'arrival': {'airport': 'Hong Kong International', 'timezone': 'Asia/Hong_Kong', 'iata': 'HKG', 'icao': 'VHHH', 'terminal': '1', 'gate': None, 'baggage': None, 'delay': None, 'scheduled': '2022-10-12T06:55:00+00:00', 'estimated': '2022-10-12T06:55:00+00:00', 'actual': None, 'estimated_runway': None, 'actual_runway': None}, 'airline': {'name': 'Finnair', 'iata': 'AY', 'icao': 'FIN'}, 'flight': {'number': '5844', 'iata': 'AY5844', 'icao': 'FIN5844', 'codeshared': {'airline_name': 'cathay pacific', 'airline_iata': 'cx', 'airline_icao': 'cpa', 'flight_number': '178', 'flight_iata': 'cx178', 'flight_icao': 'cpa178'}}, 'aircraft': None, 'live': None}, {'flight_date': '2022-10-12', 'flight_status': 'active', 'departure': {'airport': 'Tullamarine', 'timezone': 'Australia/Melbourne', 'iata': 'MEL', 'icao': 'YMML', 'terminal': '2', 'gate': '5', 'delay': 25, 'scheduled': '2022-10-12T00:30:00+00:00', 'estimated': '2022-10-12T00:30:00+00:00', 'actual': '2022-10-12T00:55:00+00:00', 'estimated_runway': '2022-10-12T00:55:00+00:00', 'actual_runway': '2022-10-12T00:55:00+00:00'}, 'arrival': {'airport': 'Kuala Lumpur International Airport (klia)', 'timezone': 'Asia/Kuala_Lumpur', 'iata': 'KUL', 'icao': 'WMKK', 'terminal': '1', 'gate': None, 'baggage': None, 'delay': 3, 'scheduled': '2022-10-12T06:00:00+00:00', 'estimated': '2022-10-12T06:00:00+00:00', 'actual': None, 'estimated_runway': None, 'actual_runway': None}, 'airline': {'name': 'KLM', 'iata': 'KL', 'icao': 'KLM'}, 'flight': {'number': '4109', 'iata': 'KL4109', 'icao': 'KLM4109', 'codeshared': {'airline_name': 'malaysia airlines', 'airline_iata': 'mh', 'airline_icao': 'mas', 'flight_number': '128', 'flight_iata': 'mh128', 'flight_icao': 'mas128'}}, 'aircraft': None, 'live': None}]}

The dictionary is stored under the variable name api_response. I am using the following code to convert to a dataframe as described in https://sparkbyexamples.com/pandas/pandas-convert-json-to-dataframe/

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

My code:

import boto3
import json
from datetime import datetime
import calendar
import random
import time
import requests
import pandas as pd

aircraftdata = ''
params = {
'access_key': 'KEY',
'limit': '2',
'flight_status':'active'
  }


url = "http://api.aviationstack.com/v1/flights"


api_result = requests.get('http://api.aviationstack.com/v1/flights', params)
api_statuscode =  api_result.status_code
api_response = api_result.json()


print (type(api_response)) #dictionary
print (api_response)


df = pd.DataFrame.from_dict(api_response, orient = 'index')

This yields the following error:

AttributeError: 'list' object has no attribute 'items'

I would like to obtain a dataframe with for each flight the live data:


flight_iata, live_latitude, live_longitude
AA1004, 36.2, -106.8

>Solution :

df = pd.json_normalize(api_response["data"])
df = df[df.loc[:, df.columns.str.contains("live", case=False)].columns]

print(df)

                live.updated  live.latitude  live.longitude  live.altitude  live.direction  live.speed_horizontal  live.speed_vertical  live.is_ground
0  2019-12-12T10:00:00+00:00        36.2856        -106.807        8846.82          114.34                894.348                1.188           False

If you want to drop live. from the headers you can:

df.columns = df.columns.str.split(".").str[-1]
print(df)

                     updated  latitude  longitude  altitude  direction  speed_horizontal  speed_vertical  is_ground
0  2019-12-12T10:00:00+00:00   36.2856   -106.807   8846.82     114.34           894.348           1.188      False
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading