How can I go from a string column with list of labels to the format shown below?
This is what I have:
pd.DataFrame([["a",1],["b","1, 2"],["c","1,3,4"]], columns =['id', 'label'])
This is what I want:
pd.DataFrame([["a",1,0,0,0],["b",1,1,0,0],["c",1,0,1,1]], columns =['id', '1', '2', '3', '4'])
I can do this with a for loop but the execution time is horrendous.
>Solution :
Use .str.get_dummies():
df = pd.concat([df.drop('label', axis=1), df['label'].str.get_dummies(',')], axis=1)
Output:
>>> df
id 1 2 3 4
0 a 1 0 0 0
1 b 1 1 0 0
2 c 1 0 1 1