Home How do I iterate through a df column (where each row is a list), looking for elements in a different list?

Questions

How do I iterate through a df column (where each row is a list), looking for elements in a different list?

February 4, 2022

I would love your advice on the best code to complete the following task:

menu_items = ['acres',
 'adobo',
 'affogato',
 'agua',
 'aioli',
 'akaushi',
 'alaskan',
 'almonds',
 'ambriza',
 'american',
 'angolotti',
 'antiguas',
 'apple',
 'apples',
 'arancini',
 'arroz',
 'artichoke',
 'arugula',
 'asada',
 'asado',
 'asparagus',
 'atlantic',
 'avocado',
 'avokatsu',
 'award',
 'baby',
 'back',
 'backyard',
 'bacon',
 'baked',
 'bakes',
 'balls',
 'balsalmic',
 'tomato']

df['lemmatized'] = [['beautiful', 'location', 'steak', 'seafood', 'par', 'expectation', 'restaurant', 'caliber', 'saute', 'spinach', 'amazing', 'service', 'however', 'well', 'average', 'expectation', 'high', 'end', 'restaurant', 'sticking', 'eddie', 'v', 'trulucks', 'future'],['almonds','pleasant', 'surprise', 'came', 'last', 'week', 'storm', 'tired', 'crabby', 'air', 'travel', 'food', 'breath', 'fresh', 'air', 'area', 'saturated', 'nothing', 'special', 'run', 'mill', 'tex', 'mex', 'ambriza', 'waitress', 'exceptional', 'blew', 'u', 'away', 'said', 'th', 'table', 'ever', 'new', 'server', 'super', 'friendly', 'get', 'anymore', 'looking', 'forward', 'digging', 'menu', 'fun'],['ordered', 'corn', 'cob', 'really', 'good', 'fact', 'served', 'basil', 'sauce', 'added', 'different', 'taste', 'altogether', 'corn', 'juicy', 'well', 'smoked', 'offer', 'pizza', 'hand', 'tossed', 'took', 'arrive', 'would', 'say', 'min', 'tomato', 'basil', 'okay']]

I have a variable called menu_items which is a single-word list of menu item names.

Next, in a dataframe I have a column of reviews where each row is one review and each review is broken down into a list of single words.

What I am trying to do is add code that will iterate through the lemmatized column and search each word in each list for the presence of any words found in the menu_items list.

If a word is found in the review single word list that matches a word in the menu_items list I want to display the number of matches in a column called df['Match'].

Here is what I have tried:

for item in df['lemmatized']:
    for element in item:
        if element in menu_items:
            df['Match'] += 1
        else:
           df['Match'] = 0

This produced zero matches even though I have visually confirmed that there are matches.

>Solution :

A better approach would be to use set intersection (assuming you’re trying to count unique matches, i.e., you’re not interested in how many times "apple" is mentioned in a review, only that it is mentioned, period).

This should get you what you want, again, assuming you want to count unique matches and assuming your lemmatized column values are indeed lists of strings:

df["lemmatized"].apply(lambda r: len(set(r) & set(menu_items)))

If your lemmatized column values are not actually lists of strings, you’ll need to do that first:

def get_matches(row):
    words = set(row.strip("[]").split(","))
    return len(words & set(menu_items))


df["matches"] = df["lemmatized"].apply(get_matches)