how to use pd.json_normalize to retrieve the data I need

I have this list/json in Python,

[{'id': 'TC2-FFA',
  'shortCode': 'TC2-FFA',
  'dataSet': {'datumPrecision': 2,
   'id': 'TC2_37',
   'shortCode': 'TC2_37',
   'shortDescription': 'Clean Continent to US Atlantic coast',
   'displayGroup': 'BCTI',
   'datumUnit': 'Worldscale',
   'data': [{'value': 156.11, 'date': '2023-03-06'}],
   'apiIdentifier': 'RDSX9KRCHQI9TVGID5O7XQGBP1KKBZ0F'},
  'datumUnit': 'WS',
  'datumPrecision': 3,
  'projectionStartOn': '2005-01-04T00:00:00',
  'projectionEndOn': '2023-03-06T00:00:00',
  'apiIdentifier': 'RPSBTGHKN64SV91SV9R3492RCH33D2OH'},
 {'id': 'TC2$-FFA',
  'shortCode': 'TC2$-FFA',
  'dataSet': {'datumPrecision': 2,
   'id': 'TC2_37',
   'shortCode': 'TC2_37',
   'shortDescription': 'Clean Continent to US Atlantic coast',
   'displayGroup': 'BCTI',
   'datumUnit': 'Worldscale',
   'data': [{'value': 156.11, 'date': '2023-03-06'}],
   'apiIdentifier': 'RDSX9KRCHQI9TVGID5O7XQGBP1KKBZ0F'},
  'datumUnit': '$/mt',
  'datumPrecision': 3,
  'projectionStartOn': '2010-05-10T00:00:00',
  'projectionEndOn': '2023-03-06T00:00:00',
  'apiIdentifier': 'RPSH1H9454DYUE7G8CLHVLFPJZ3BVM77'}]

how to use pandas.json_normalize to only retrieve the data ‘shortCode'(or ‘id’, as they seem to have the same value) and ‘data’ under ‘dataSet’ (the path is ‘dataSet’–‘data’–‘value’ and ‘date’)?

This is the desired dataframe

  shortCode  data.value   data.date
0   TC2-FFA      156.11  2023-03-06
1   TC2$-FFA      156.11  2023-03-06

I tried pd.json_normalize(lst_object, record_path=['dataSet', ['shortCode', ['data', 'value'], ['data', 'date']])
but failed

>Solution :

It can be confusing.

You want all values from dataSet.data so they are your records.

You can then use meta to bring in shortCode

pd.json_normalize(
   data, 
   record_path=['dataSet', 'data'], 
   meta=['shortCode']
)
    value        date shortCode
0  156.11  2023-03-06   TC2-FFA
1  156.11  2023-03-06  TC2$-FFA

Leave a Reply