Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Empty list causing pd.DataFrame() to return no rows

import pandas as pd
pd.DataFrame({'genre': 'Pop',
 'country': 'CA',
 'artist_name': 'Olivia Rodrigo',
 'title_name': 'good 4 u',
 'release_date': '2021-05-13',
 'core_genre': 'Pop',
 'metrics': [],
 'week_id': 202101,
 'top_isrc': 'USUG12101245'})

is returning column names but an otherwise empty dataframe, and this is happening because of the empty list for metrics:. This is a problem. It would be better if this returned a 1-row dataframe with an empty list in the metrics column.

enter image description here

Here is an example of the data without missing metrics:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

{'genre': 'Pop',
 'country': 'CA',
 'artist_name': 'Olivia Rodrigo',
 'title_name': 'drivers license',
 'release_date': '2021-01-07',
 'core_genre': 'Pop',
 'metrics': [{'name': 'Song w/SES On-Demand',
   'value': [{'name': 'tp', 'value': 1},
    {'name': 'lp', 'value': 0},
    {'name': 'ytd', 'value': 1},
    {'name': 'atd', 'value': 1}]},
  {'name': 'Song w/SES On-Demand Audio',
   'value': [{'name': 'tp', 'value': 0},
    {'name': 'lp', 'value': 0},
    {'name': 'ytd', 'value': 0},
    {'name': 'atd', 'value': 0}]},
  {'name': 'Streaming On-Demand Total',
   'value': [{'name': 'tp', 'value': 414},
    {'name': 'lp', 'value': 0},
    {'name': 'ytd', 'value': 414},
    {'name': 'atd', 'value': 414}]},
  {'name': 'Streaming On-Demand Audio',
   'value': [{'name': 'tp', 'value': 69},
    {'name': 'lp', 'value': 0},
    {'name': 'ytd', 'value': 69},
    {'name': 'atd', 'value': 69}]}],
 'week_id': 202101,
 'top_isrc': 'USUG12004749'}

and this is handled quite nicely by pd.DataFrame(), creating a row for each of the 4 nested options within the list in metrics. I assume for the same reason pd.DataFrame() returns 4 rows on this second example (4 dicts in the list), pd.DataFrame() returns 0 rows in the example above (0 dicts in the list). However the lost row of data is a problem. How can we handle this?

>Solution :

An empty list can be achieved by passing in a list of an empty list:

df = pd.DataFrame({'genre': 'Pop',
 'country': 'CA',
 'artist_name': 'Olivia Rodrigo',
 'title_name': 'good 4 u',
 'release_date': '2021-05-13',
 'core_genre': 'Pop',
 'metrics': [[]],
 'week_id': 202101,
 'top_isrc': 'USUG12101245'})

Gives

  genre country     artist_name title_name release_date core_genre metrics  week_id      top_isrc
0   Pop      CA  Olivia Rodrigo   good 4 u   2021-05-13        Pop      []   202101  USUG12101245

Or you could make it a list of an empty dict [{}] too.

Comment:

It’s interesting that just specifying a single list returns a blank row, but I suppose from pandas’s point of view, it might have trouble distinguishing a vector of row values from a single row value that is a vector, and the default behaviour is to, apparantly, throw the whole row away? Interesting.

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading