Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Split list in a column to multiple columns

I have two DataFrames as below:

df1.shape = (4,2)

Text Topic
Where is the party tonight? Party
Let’s dance Party
Hello world Other
It is rainy today Weather

df2.shape(4,2)

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

0 1
Where is the party tonight? [-0.011570500209927559, -0.010117080062627792,….,0.062448356]
Let’s dance [-0.08268199861049652, -0.0016140303341671824,….,0.02094201]
Hello world [-0.0637684240937233, -0.01590338535606861,….,0.02094201]
It is rainy today [0.06379614025354385, -0.02878064103424549,….,0.056790903]

Basically df2 is the embedding of each sentence on the df1 which has a topic associated to it. The embedding is in ‘column 1’ in df2 which has a string of list of positive or negative integers of size 512.

My desired output DataFrame is:

df_output.shape = (4,514)

Text Topic Feature_0 Feature_2 …. Feature_511
Where is the party tonight? Party -0.0115705 -0.01011708 …. 0.0624484
Let’s dance Party -0.082681999 -0.00161403 …. 0.020942
Hello world Other -0.063768424  -0.01590338535606861, …. 0.020942
It is rainy today Weather 0.06379614 -0.028780641 …. 0.056790903

How can I get this done. I was trying to split the embeddings in the DataFrame df2 into columns but it doesn’t work for me. This is what I have done so far:

df2.join(pd.DataFrame(df2["1"].values.tolist()).add_prefix('feature_'))

It just created a duplicate column 1 as feature_0. I haven’t even reached to the stage where I can work to join df1 and df2.

>Solution :

You could map ast.literal_eval to items in df2["1"]; build a DataFrame and join it to df1:

import ast
out = df1.join(pd.DataFrame(map(ast.literal_eval, df2["1"].tolist())).add_prefix('feature_'))

Output:

                          Text    Topic  feature_0  feature_1  feature_2
0  Where is the party tonight?    Party  -0.011571  -0.010117   0.062448
1                  Let's dance    Party  -0.082682  -0.001614   0.020942
2                  Hello world    Other  -0.063768  -0.015903   0.020942
3            It is rainy today  Weather   0.063796  -0.028781   0.056791
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading