I have two DataFrames as below:
df1.shape = (4,2)
| Text | Topic |
|---|---|
| Where is the party tonight? | Party |
| Let’s dance | Party |
| Hello world | Other |
| It is rainy today | Weather |
df2.shape(4,2)
| 0 | 1 |
|---|---|
| Where is the party tonight? | [-0.011570500209927559, -0.010117080062627792,….,0.062448356] |
| Let’s dance | [-0.08268199861049652, -0.0016140303341671824,….,0.02094201] |
| Hello world | [-0.0637684240937233, -0.01590338535606861,….,0.02094201] |
| It is rainy today | [0.06379614025354385, -0.02878064103424549,….,0.056790903] |
Basically df2 is the embedding of each sentence on the df1 which has a topic associated to it. The embedding is in ‘column 1’ in df2 which has a string of list of positive or negative integers of size 512.
My desired output DataFrame is:
df_output.shape = (4,514)
| Text | Topic | Feature_0 | Feature_2 | …. | Feature_511 |
|---|---|---|---|---|---|
| Where is the party tonight? | Party | -0.0115705 | -0.01011708 | …. | 0.0624484 |
| Let’s dance | Party | -0.082681999 | -0.00161403 | …. | 0.020942 |
| Hello world | Other | -0.063768424 | -0.01590338535606861, | …. | 0.020942 |
| It is rainy today | Weather | 0.06379614 | -0.028780641 | …. | 0.056790903 |
How can I get this done. I was trying to split the embeddings in the DataFrame df2 into columns but it doesn’t work for me. This is what I have done so far:
df2.join(pd.DataFrame(df2["1"].values.tolist()).add_prefix('feature_'))
It just created a duplicate column 1 as feature_0. I haven’t even reached to the stage where I can work to join df1 and df2.
>Solution :
You could map ast.literal_eval to items in df2["1"]; build a DataFrame and join it to df1:
import ast
out = df1.join(pd.DataFrame(map(ast.literal_eval, df2["1"].tolist())).add_prefix('feature_'))
Output:
Text Topic feature_0 feature_1 feature_2
0 Where is the party tonight? Party -0.011571 -0.010117 0.062448
1 Let's dance Party -0.082682 -0.001614 0.020942
2 Hello world Other -0.063768 -0.015903 0.020942
3 It is rainy today Weather 0.063796 -0.028781 0.056791