Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

How to print index (Row number) while conversion error during json.loads Panda Dataframe

I am using below code for Json load which works fine for valid json string, but for non valid it throws error.

orgdf['data'].apply(json.loads)

enter image description here

I just need to know for which index (row number) there is an invalid record for which Jason.loads giving error.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

I know I can do it using dataframe enumeration (for loop), but looking for an efficient way to do that as it contains Million records.

It will be great if someone can help on the same.

>Solution :

You can create a custom function where you wrap the json.loads call in a try/except and then call this function inside apply. See also this answer.

def is_valid_json(s):
    try:
        json.loads(s)
    except (json.JSONDecodeError, ValueError):
        return False
    return True

# Mark valid JSON strings
valid = orgdf['data'].apply(is_valid_json)

# Extract indices with _invalid_ strings
invalid_indices = valid[~valid].index
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading