Find first occurrence of a substring in a dataframe column

January 18, 2022

I want to find the first occurrence of a substring in a dataframe’s column. I wanted a concise way of doing this so I attempted to use argmax.

Take for instance the following dataframe:

import pandas as pd
#Create dataframe
data = {
        'Name':['Tom', 'Dick', 'Harry'],
        'Mood':['Grumbly', 'Very Happy', "Happy"],
        'Speed':[20, 18, 19]
        }
df = pd.DataFrame(data)

Let’s say I want the first occurrence of a ‘Happy’ mood type (This would be ‘Very Happy’; index: 1). The cells are set-up so that ‘Happy’ is a suitable sub-string for this search.

I can get an exact string match for Happy.

# Returns index: 2
(df.Mood.values == 'Happy').argmax()

But this does not achieve my goal. I tried approaches such as the one below, but they fail.

# Obviously not an appropriate use of __contains__
(df.Mood.values.__contains__('Happy')).argmax()

My current work-around:

# For-loop that gets me the row in which a happy mood type occurs.
for index, row in df.iterrows():
    check_str = row['Mood']
    if 'Happy' in check_str:
        first_happy = row
        break
    else:
        first_happy = None

Is there a concise way of getting argmax (or similar) to do this?

My work-around is adequate I’m just interested and want to improve.