Follow

Follow

Contact

Home add a key to a pandas dataframe where the column value is json

Questions

add a key to a pandas dataframe where the column value is json

byMR

July 17, 2022

I have a pandas dataframe like this

import pandas as pd                                                                         
technologies = [                                                                            
            ("Spark", 22000,'30days',1000.0),                                               
            ("PySpark",25000,'50days',2300.0),                                              
            ("Hadoop",23000,'55days',1500.0)                                                
            ]                                                                               
df = pd.DataFrame(technologies,columns = ['Courses','Fee','Duration','Discount'])           
print(df)



   Courses    Fee Duration  Discount
0    Spark  22000   30days    1000.0
1  PySpark  25000   50days    2300.0
2   Hadoop  23000   55days    1500.0

I also have a json in one of the columns like this.

df['json'] = [json.dumps(x) for x in df.to_dict(orient='records')]

print(df)

   Courses    Fee Duration  Discount  json
0    Spark  22000   30days    1000.0  {"Courses": "Spark", "Fee": 22000, "Duration":...
1  PySpark  25000   50days    2300.0  {"Courses": "PySpark", "Fee": 25000, "Duration...
2   Hadoop  23000   55days    1500.0  {"Courses": "Hadoop", "Fee": 23000, "Duration"...

To the last column called json I want to add a new key.
Something like this

   df.apply(lambda row: json.loads(row['json'])['madeby'] = 'Bae Systems',axis=1)
             ^
SyntaxError: expression cannot contain assignment, perhaps you meant "=="?

But i seem to have run out of luck so any ideas on this please ?

>Solution :

Here’s a solution using a function so our lambda does not get too long:

def add_key(data: str) -> dict:
    data = json.loads(data)
    data["madeby"] = "Bae systems"
    return data

df["json"] = df.apply(lambda row: add_key(row["json"]), axis=1)

   Courses    Fee Duration  Discount  \
0    Spark  22000   30days    1000.0   
1  PySpark  25000   50days    2300.0   
2   Hadoop  23000   55days    1500.0   

                                                                                                      json  
0    {'Courses': 'Spark', 'Fee': 22000, 'Duration': '30days', 'Discount': 1000.0, 'madeby': 'Bae systems'}  
1  {'Courses': 'PySpark', 'Fee': 25000, 'Duration': '50days', 'Discount': 2300.0, 'madeby': 'Bae systems'}  
2   {'Courses': 'Hadoop', 'Fee': 23000, 'Duration': '55days', 'Discount': 1500.0, 'madeby': 'Bae systems'}

pandas

byMR

Published July 17, 2022

Add a comment

Leave a ReplyCancel reply

Read more

Questions

How to add column name from the pd.concat?

byMR

July 17, 2022

Questions

How do I receive the updated time every time it is called in this function?

byMR

July 17, 2022

Questions

self.class in parent class' method is missing arguments

byMR

July 17, 2022

Questions

Trying to input a fraction and split it into a list but it causes problem when the numbers are more than 1 digit

byMR

July 17, 2022

Questions

Pandas how to create new data frame that only has duplicate ids

byMR

July 17, 2022

Questions

How do you convert a Vec<String> to a &[&str] of a fixed character?

byMR

July 17, 2022