I am using below code for Json load which works fine for valid json string, but for non valid it throws error.
orgdf['data'].apply(json.loads)
I just need to know for which index (row number) there is an invalid record for which Jason.loads giving error.
I know I can do it using dataframe enumeration (for loop), but looking for an efficient way to do that as it contains Million records.
It will be great if someone can help on the same.
>Solution :
You can create a custom function where you wrap the json.loads call in a try/except and then call this function inside apply. See also this answer.
def is_valid_json(s):
try:
json.loads(s)
except (json.JSONDecodeError, ValueError):
return False
return True
# Mark valid JSON strings
valid = orgdf['data'].apply(is_valid_json)
# Extract indices with _invalid_ strings
invalid_indices = valid[~valid].index
