I have a pandas dataframe with 10 columns and 100 rows, where the first column contains a shopping list description. All other columns are filled with NAN values and the column name is a type of fruit or vegetable.
In addition to this, I have 100 lists with random food items in them, like so List1 = ['apple','banana','pear'].
The list is limited to a maximum of 9 items, but sometimes a list can be empty too.
I would now like to iterate over my 100 lists and fill in a 1, if a string in the list matches a column header.
List2 = ['smoothie']
The final dataframe should look like this:
Description | apple | banana | pear| grape | smoothie |
List1 1 1 1 0 0
List2 0 0 0 0 1
>Solution :
Use MultiLabelBinarizer with DataFrame.join and DataFrame.reindex for same values and same order like in original df:
List1 = ['apple','banana','pear']
List2 = []
L = [List1, List2]
from sklearn.preprocessing import MultiLabelBinarizer
mlb = MultiLabelBinarizer()
df1 = (pd.DataFrame(mlb.fit_transform(L),columns=mlb.classes_, index=df.index)
.reindex(df.columns[1:], fill_value=0, axis=1))
print (df1)
apple banana pear grape smoothie
0 1 1 1 0 0
1 0 0 0 0 0
df = df[['Description']].join(df1)
print (df)
Description apple banana pear grape smoothie
0 List1 1 1 1 0 0
1 List2 0 0 0 0 0