Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

extract values as columns into dataframe from same dataframe json string

Let have a data frame with json string like below

df = pd.DataFrame([{'sno':1, 'values':"[{'a':1, 'b':2,'c':3}]"},
                   {'sno':2, 'values':"[{'a':4, 'b':5,'c':6}]"}])

Need to get the json string column as new columns in the same dataframe. Tried the below code but, getting the error as "the JSON object must be str, bytes or bytearray, not Series"

df = df.assign(a=lambda x: (json.loads(x['values'])[0]['a']))

When we try to use the lambda it should be getting the values for that row only but json.loads(x[‘values’]) actually returning a series and this is causing the error.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

Is there any other ways to get this achieved?

Expected dataframe like below

sno    a     b     c
------------------------
1      1     2     3
2      4     5     6

>Solution :

If I guess correctly, what you want is to extract the value of a considering a dictionary in a list.

I would use ast.literal_eval here:

from ast import literal_eval
df['a'] = df['values'].apply(lambda x: literal_eval(x)[0]['a'])

Or:

df['a'] = [literal_eval(x)[0]['a'] for x in df['values']]

Then it won’t matter whether or not you have valid JSON (double quotes and not single quotes).

Output:

   sno                  values  a
0    1  [{'a':1, 'b':2,'c':3}]  1
1    2  [{'a':4, 'b':5,'c':6}]  4

If you need all columns:

out = df.join(pd.DataFrame([literal_eval(x)[0] for x in df['values']],
                           index=df.index))

Output:

   sno                  values  a  b  c
0    1  [{'a':1, 'b':2,'c':3}]  1  2  3
1    2  [{'a':4, 'b':5,'c':6}]  4  5  6

Or to overwrite df and drop the original values:

df = df.join(pd.DataFrame([literal_eval(x)[0] for x in df.pop('values')],
                          index=df.index))

Output:

   sno  a  b  c
0    1  1  2  3
1    2  4  5  6

Used input:

df = pd.DataFrame({'sno': [1, 2],
                   'values': ["[{'a':1, 'b':2,'c':3}]",
                              "[{'a':4, 'b':5,'c':6}]"]})
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading