I have a string with a format like below:
string_1 = '{!Cat1!:{!A!:!xd!# !B!:!yd!# !C!:!zd!# !D!:[!nd!#!nd2!#!nd3!#!nd4!]# !E!:!hd!}#!Cat2!:{!A1!:!xd1!# !B1!:!yd1!# !C1!:!zd1!# !D1!:[!nd1!#!nd21!#!nd31!#!nd41!]# !E1!:!hd1!}}'
I wanted to check if there is a way to convert above string to below dataframe:
string_1_to_df
Col1 Col2 Col3
Cat1 A xd
Cat1 B yd
Cat1 C zd
Cat1 D nd
Cat1 D nd2
Cat1 D nd3
Cat1 D nd4
Cat1 E hd
Cat2 A1 xd1
Cat2 B1 yd1
Cat2 C1 zd1
Cat2 D1 nd1
Cat2 D1 nd21
Cat2 D1 nd31
Cat2 D1 nd41
Cat2 E1 hd1
I have tried with codes
new_String = string_1 .replace('!', '"').replace('#', ',')
new_String = json.loads(new_String )
new_String_df = pd.DataFrame.from_dict(new_String)
But the output I am getting is different and not in a dataframe
>Solution :
The next step after you get your python object out of json.loads
is to flatten your nested data. There are plenty of ways to do this, but being that you are likely not using advanced methods of pandas, dataframes, and even regular old python types, iterating through your object is probably the right level for your homework.
You’ll likely need to nest a few for loops. As an example, to extract the first two columns of your dataframe from this nested output:
#your existing code
new_String = string_1 .replace('!', '"').replace('#', ',')
new_String = json.loads(new_String)
#iterating through the object to make a list of lists:
flattened = []
for col1 in new_String:
for col2 in js[col1]:
flattened.append([col1, col2])
To get that third element, you can iterate once again on js[col1][col2]
. Note that there is an added level of complexity at this layer as you have elements like 'A':'xd'
as well as 'D': ['nd', 'nd2', 'nd3', 'nd4']
so you’ll need to determine if you are working with a dictionary value of type str
or list
and iterate appropriately.
Once done you can push into a dataframe with:
df = pd.DataFrame(flattened, columns=['col1','col2','col3'])