Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

str.contains not working when there is not a space between the word and special character

I have a dataframe which includes the names of movie titles and TV Series.

From specific keywords I want to classify each row as Movie or Title according to these key words. However, due to brackets not having a space between the key words they are not being picked up by the str.contains() funtion and I need to do a workaround.

This is my dataframe:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

import pandas as pd
import numpy as np

watched_df = pd.DataFrame([['Love Death Robots (Episode 1)'], 
                   ['James Bond'],
                   ['How I met your Mother (Avnsitt 3)'], 
                   ['random name'],
                   ['Random movie 3 Episode 8383893']], 
                  columns=['Title'])
watched_df.head()

To add the column that classifies the titles as TV series or Movies I have the following code.

watched_df["temporary_brackets_removed_title"] = watched_df['Title'].str.replace('(', '')
watched_df["Film_Type"] = np.where(watched_df.temporary_brackets_removed_title.astype(str).str.contains(pat = 'Episode | Avnsitt', case = False), 'Series', 'Movie')
watched_df = watched_df.drop('temporary_brackets_removed_title', 1)
watched_df.head()

Is there a simpler way to solve this without having to add and drop a column?

Maybe a str.contains-like function that does not look at a string being the exact same but just containing the given word? Similar to how in SQL you have the "Like" functionality?

>Solution :

You can use str.contains and then map the results:

watched_df['Film_Type'] = watched_df['Title'].str.contains(r'(?:Episode|Avnsitt)').map({True: 'Series', False: 'Movie'})

Output:

>>> watched_df
                               Title Film_Type
0      Love Death Robots (Episode 1)    Series
1                         James Bond     Movie
2  How I met your Mother (Avnsitt 3)    Series
3                        random name     Movie
4     Random movie 3 Episode 8383893     Movie
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading