add one point to pandas row if string from list matches column name

January 31, 2022

I have a pandas dataframe with 10 columns and 100 rows, where the first column contains a shopping list description. All other columns are filled with NAN values and the column name is a type of fruit or vegetable.

In addition to this, I have 100 lists with random food items in them, like so List1 = ['apple','banana','pear'].

The list is limited to a maximum of 9 items, but sometimes a list can be empty too.

I would now like to iterate over my 100 lists and fill in a 1, if a string in the list matches a column header.

List2 = ['smoothie']

The final dataframe should look like this:

Description | apple | banana | pear| grape | smoothie |
List1           1       1        1     0        0
List2           0       0        0     0        1

>Solution :

Use MultiLabelBinarizer with DataFrame.join and DataFrame.reindex for same values and same order like in original df:

List1 = ['apple','banana','pear']
List2 = []

L = [List1, List2]

from sklearn.preprocessing import MultiLabelBinarizer

mlb = MultiLabelBinarizer()
df1 = (pd.DataFrame(mlb.fit_transform(L),columns=mlb.classes_, index=df.index)
         .reindex(df.columns[1:], fill_value=0, axis=1))
print (df1)
   apple  banana  pear  grape  smoothie
0      1       1     1      0         0
1      0       0     0      0         0

df = df[['Description']].join(df1)
print (df)
  Description  apple  banana  pear  grape  smoothie
0       List1      1       1     1      0         0
1       List2      0       0     0      0         0