Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Check columns for groups of strings, replace with 1 if they exist 0 if they do not – python, pandas, logical operators

I’m trying to search for a set of strings in a column in a pandas dataframe and replace with 1 if the strings exist and 0 if they do not.

Per the example below, this works fine on the first pass:

df = pd.DataFrame({'ID':[1,2,3,4], 'Event':['1 Day', '2 Days','3 Days','4 Days']})
df['Event'] = np.where(df['Event'].str.contains('3 Days|4 Days'),1,df['Event'])

df

but when I try and apply the opposite logic and replace the instances where the strings do not exist:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

df = pd.DataFrame({'ID':[1,2,3,4], 'Event':['1 Day', '2 Days','3 Days','4 Days']})
df['Event'] = np.where(df['Event'].str.contains('3 Days|4 Days'),1,df['Event'])
df['Event'] = np.where(~df['Event'].str.contains('3 Days|4 Days'),0,df['Event'])  

df

I get this error – TypeError: bad operand type for unary ~: 'float'

I tried using logical operators so the actions would occur simultaneously:

df = pd.DataFrame({'ID':[1,2,3,4], 'Event':['1 Day', '2 Days','3 Days','4 Days']})
df['Event'] = np.where(df['Event'].str.contains('3 Days|4 Days'),1,df['Event']) & np.where(~df['Event'].str.contains('3 Days|4 Days'),0,df['Event'])  

df

but received this error… TypeError: unsupported operand type(s) for &: 'str' and 'int'

What I’m ultimately trying to achieve is a df that replaces all the cells where the strings exist with 1 and the instances where those strings do not exist with 0s so I can analyze. Like so:

ID  Event
1   0  
2   0
3   1
4   1

>Solution :

After this line:

df['Event'] = np.where(df['Event'].str.contains('3 Days|4 Days'),1, df['Event'])

df['Event'] contains 1 which is not a string, so the second time you check (inside np.where):

df['Event'].str.contains('3 Days|4 Days')

it returns:

0    False
1    False
2      NaN
3      NaN
Name: Event, dtype: object

Since NaN doesn’t evaluate ~NaN, it returns an error.

To get the desired outcome, simply use np.where once where you select 1 if True, 0 otherwise:

df['Event'] = np.where(df['Event'].str.contains('3 Days|4 Days'), 1, 0)

Output:

   ID  Event
0   1      0
1   2      0
2   3      1
3   4      1
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading