My input is a Python dictionnary :
d = {
"username": "foo999",
"email": "bar999@example.com",
"address": {
"city": "Faketown",
"state": "Fakeshire"
},
"is_premium_user": True,
}
Using pd.json_normalize(d, record_prefix=False), I’m still getting the prefix like in here : address.city
username email is_premium_user address.city address.state
0 foo999 bar999@example.com True Faketown Fakeshire
My expected output is :
username email is_premium_user city state
0 foo999 bar999@example.com True Faketown Fakeshire
The code below works but seems hacky and just a workaround.. Also, it suppose I know all the nested keys..
df.columns = df.columns.str.replace("(?:address|2nd_key|...)\.", "", regex=True)
>Solution :
I’m quite sure it’s not possible. This avoids ambiguity in case subkeys are identical:
pd.json_normalize({"address": {"city": "Faketown"},
"destination": {"city": "Othertown"}})
Some other workarounds:
df = (pd.json_normalize(d, record_prefix='X')
.rename(columns=lambda x: x.split('.')[-1])
)
Or:
df = pd.json_normalize(d, record_prefix='X')
df.columns = df.columns.str.replace(r'.*\.', '', regex=True)