I want to find the first occurrence of a substring in a dataframe’s column. I wanted a concise way of doing this so I attempted to use argmax.
Take for instance the following dataframe:
import pandas as pd
#Create dataframe
data = {
'Name':['Tom', 'Dick', 'Harry'],
'Mood':['Grumbly', 'Very Happy', "Happy"],
'Speed':[20, 18, 19]
}
df = pd.DataFrame(data)
Let’s say I want the first occurrence of a ‘Happy’ mood type (This would be ‘Very Happy’; index: 1). The cells are set-up so that ‘Happy’ is a suitable sub-string for this search.
I can get an exact string match for Happy.
# Returns index: 2
(df.Mood.values == 'Happy').argmax()
But this does not achieve my goal. I tried approaches such as the one below, but they fail.
# Obviously not an appropriate use of __contains__
(df.Mood.values.__contains__('Happy')).argmax()
My current work-around:
# For-loop that gets me the row in which a happy mood type occurs.
for index, row in df.iterrows():
check_str = row['Mood']
if 'Happy' in check_str:
first_happy = row
break
else:
first_happy = None
Is there a concise way of getting argmax (or similar) to do this?
My work-around is adequate I’m just interested and want to improve.
>Solution :
You can do that with the following.
df.Mood.str.contains("Happy").idxmax()