Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Create new column using str.contains and based on if-else condition

I have a list of names ‘pattern’ that I wish to match with strings in column ‘url_text’. If there is a match i.e. True the name should be printed in a new column ‘pol_names_block’ and if False leave the row empty.

pattern = '|'.join(pol_names_list) 

print(pattern)
'Jon Kyl|Doug Jones|Tim Kaine|Lindsey Graham|Cory Booker|Kamala Harris|Orrin Hatch|Bernie Sanders|Thom Tillis|Jerry Moran|Shelly Moore Capito|Maggie Hassan|Tom Carper|Martin Heinrich|Steve Daines|Pat Toomey|Todd Young|Bill Nelson|John Barrasso|Chris Murphy|Mike Rounds|Mike Crapo|John Thune|John. McCain|Susan Collins|Patty Murray|Dianne Feinstein|Claire McCaskill|Lamar Alexander|Jack Reed|Chuck Grassley|Catherine Masto|Pat Roberts|Ben Cardin|Dean Heller|Ron Wyden|Dick Durbin|Jeanne Shaheen|Tammy Duckworth|Sheldon Whitehouse|Tom Cotton|Sherrod Brown|Bob Corker|Tom Udall|Mitch McConnell|James Lankford|Ted Cruz|Mike Enzi|Gary Peters|Jeff Flake|Johnny Isakson|Jim Inhofe|Lindsey Graham|Marco Rubio|Angus King|Kirsten Gillibrand|Bob Casey|Chris Van Hollen|Thad Cochran|Richard Burr|Rob Portman|Jon Tester|Bob Menendez|John Boozman|Mazie Hirono|Joe Manchin|Deb Fischer|Michael Bennet|Debbie Stabenow|Ben Sasse|Brian Schatz|Jim Risch|Mike Lee|Elizabeth Warren|Richard Blumenthal|David Perdue|Al Franken|Bill Cassidy|Cory Gardner|Lisa Murkowski|Maria Cantwell|Tammy Baldwin|Joe Donnelly|Roger Wicker|Amy Klobuchar|Joel Heitkamp|Joni Ernst|Chris Coons|Mark Warner|John Cornyn|Ron Johnson|Patrick Leahy|Chuck Schumer|John Kennedy|Jeff Merkley|Roy Blunt|Richard Shelby|John Hoeven|Rand Paul|Dan Sullivan|Tim Scott|Ed Markey'

I am using the following code df['url_text'].str.contains(pattern) which results in True in case a name in ‘pattern’ is present in a row in column ‘url_text’ and False otherwise. With that I have tried the following code:

df['pol_name_block'] = df.apply(
    lambda row: pol_names_list if df['url_text'].str.contains(pattern) in row['url_text'] else ' ',
    axis=1
)

I get the error:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

TypeError: 'in <string>' requires string as left operand, not Series

>Solution :

From this toy Dataframe :

>>> import pandas as pd
>>> from io import StringIO

>>> df = pd.read_csv(StringIO("""
... id,url_text
... 1,Tim Kaine
... 2,Tim Kain
... 3,Tim
... 4,Lindsey Graham.com
... """), sep=',')
>>> df
    id  url_text
0   1   Tim Kaine
1   2   Tim Kain
2   3   Tim
3   4   Lindsey Graham.com

From pol_names_list, we build patterns by formating it like so :

patterns = '(%s)' % '|'.join(pol_names_list)

Then, we can use the extract method to assign the value to the column pol_name_block to get the expected result :

df['pol_name_block'] = df['url_text'].str.extract(patterns)

Output :

    id  url_text            pol_name_block
0   1   Tim Kaine           Tim Kaine
1   2   Tim Kain            NaN
2   3   Tim                 NaN
3   4   Lindsey Graham.com  Lindsey Graham
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading