I tried the method suggested in Lookup a Dataframe column with list and return list matching the row of another column? to look up the column value,Col1 that has the elements in the query list matching Col2, but faced with
KeyError: "None of [Index(['watermelon', 'grape'], dtype='object', name='Col1')] are in the [index]"
Code used
df = pd.DataFrame({'Col1': ['Jack', 'Mary', 'Andrew'],
'Col2': [['apple', 'banana', 'papaya', 'tomato'],['berry', 'juice', 'watermelon'], ['cabbage', 'grape']]})
query = ['watermelon', 'grape']
q = df.set_index('Col1').loc[query]
Expecting
Col1
0 Mary
1 Andrew
>Solution :
Suggested method won’t work for your query, because you want the opposite.
loc is a function to query an index, not a column. Ie. the query can be something within this list ['Jack', 'Mary', 'Andrew'], which is why you are facing an error.
Also, unlike the dataframe from the suggested method, your column contains nested list:
Col1 Col2
0 Jack [apple, banana, papaya, tomato]
1 Mary [berry, juice, watermelon]
2 Andrew [cabbage, grape]
As far as I understand, you want to find those Col1 values where Col2 matches one of the queries. We can create a new Pandas Series, by applying a function that will search if the fruit is in the query.
df_ = df.set_index('Col1')['Col2'].apply(lambda fruits: any([f in query for f in fruits]))
Output:
print(df_[df_])
Col1
Mary True
Andrew True
Name: Col2, dtype: bool
Let me know if this was your intention.