Is there a way to convert string of certain format to dataframe in Python

I have a string with a format like below:

string_1 = '{!Cat1!:{!A!:!xd!# !B!:!yd!# !C!:!zd!# !D!:[!nd!#!nd2!#!nd3!#!nd4!]# !E!:!hd!}#!Cat2!:{!A1!:!xd1!# !B1!:!yd1!# !C1!:!zd1!# !D1!:[!nd1!#!nd21!#!nd31!#!nd41!]# !E1!:!hd1!}}'

I wanted to check if there is a way to convert above string to below dataframe:

Col1 Col2 Col3

Cat1  A    xd
Cat1  B    yd
Cat1  C    zd
Cat1  D    nd
Cat1  D    nd2
Cat1  D    nd3
Cat1  D    nd4
Cat1  E    hd
Cat2  A1   xd1
Cat2  B1   yd1
Cat2  C1   zd1
Cat2  D1   nd1
Cat2  D1   nd21
Cat2  D1   nd31
Cat2  D1   nd41
Cat2  E1   hd1

I have tried with codes

new_String = string_1 .replace('!', '"').replace('#', ',')
new_String = json.loads(new_String )
new_String_df = pd.DataFrame.from_dict(new_String)

But the output I am getting is different and not in a dataframe

>Solution :

The next step after you get your python object out of json.loads is to flatten your nested data. There are plenty of ways to do this, but being that you are likely not using advanced methods of pandas, dataframes, and even regular old python types, iterating through your object is probably the right level for your homework.

You’ll likely need to nest a few for loops. As an example, to extract the first two columns of your dataframe from this nested output:

#your existing code
new_String = string_1 .replace('!', '"').replace('#', ',')
new_String = json.loads(new_String)

#iterating through the object to make a list of lists:
flattened = []
for col1 in new_String:
    for col2 in js[col1]:
        flattened.append([col1, col2])

To get that third element, you can iterate once again on js[col1][col2]. Note that there is an added level of complexity at this layer as you have elements like 'A':'xd' as well as 'D': ['nd', 'nd2', 'nd3', 'nd4'] so you’ll need to determine if you are working with a dictionary value of type str or list and iterate appropriately.

Once done you can push into a dataframe with:

df = pd.DataFrame(flattened, columns=['col1','col2','col3'])

Leave a Reply