Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

How do I iterate through a df column (where each row is a list), looking for elements in a different list?

I would love your advice on the best code to complete the following task:

menu_items = ['acres',
 'adobo',
 'affogato',
 'agua',
 'aioli',
 'akaushi',
 'alaskan',
 'almonds',
 'ambriza',
 'american',
 'angolotti',
 'antiguas',
 'apple',
 'apples',
 'arancini',
 'arroz',
 'artichoke',
 'arugula',
 'asada',
 'asado',
 'asparagus',
 'atlantic',
 'avocado',
 'avokatsu',
 'award',
 'baby',
 'back',
 'backyard',
 'bacon',
 'baked',
 'bakes',
 'balls',
 'balsalmic',
 'tomato']

df['lemmatized'] = [['beautiful', 'location', 'steak', 'seafood', 'par', 'expectation', 'restaurant', 'caliber', 'saute', 'spinach', 'amazing', 'service', 'however', 'well', 'average', 'expectation', 'high', 'end', 'restaurant', 'sticking', 'eddie', 'v', 'trulucks', 'future'],['almonds','pleasant', 'surprise', 'came', 'last', 'week', 'storm', 'tired', 'crabby', 'air', 'travel', 'food', 'breath', 'fresh', 'air', 'area', 'saturated', 'nothing', 'special', 'run', 'mill', 'tex', 'mex', 'ambriza', 'waitress', 'exceptional', 'blew', 'u', 'away', 'said', 'th', 'table', 'ever', 'new', 'server', 'super', 'friendly', 'get', 'anymore', 'looking', 'forward', 'digging', 'menu', 'fun'],['ordered', 'corn', 'cob', 'really', 'good', 'fact', 'served', 'basil', 'sauce', 'added', 'different', 'taste', 'altogether', 'corn', 'juicy', 'well', 'smoked', 'offer', 'pizza', 'hand', 'tossed', 'took', 'arrive', 'would', 'say', 'min', 'tomato', 'basil', 'okay']]

I have a variable called menu_items which is a single-word list of menu item names.

enter image description here

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

Next, in a dataframe I have a column of reviews where each row is one review and each review is broken down into a list of single words.

enter image description here

What I am trying to do is add code that will iterate through the lemmatized column and search each word in each list for the presence of any words found in the menu_items list.

If a word is found in the review single word list that matches a word in the menu_items list I want to display the number of matches in a column called df['Match'].

Here is what I have tried:

for item in df['lemmatized']:
    for element in item:
        if element in menu_items:
            df['Match'] += 1
        else:
           df['Match'] = 0

This produced zero matches even though I have visually confirmed that there are matches.

>Solution :

A better approach would be to use set intersection (assuming you’re trying to count unique matches, i.e., you’re not interested in how many times "apple" is mentioned in a review, only that it is mentioned, period).

This should get you what you want, again, assuming you want to count unique matches and assuming your lemmatized column values are indeed lists of strings:

df["lemmatized"].apply(lambda r: len(set(r) & set(menu_items)))

If your lemmatized column values are not actually lists of strings, you’ll need to do that first:

def get_matches(row):
    words = set(row.strip("[]").split(","))
    return len(words & set(menu_items))


df["matches"] = df["lemmatized"].apply(get_matches)
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading