I’m trying to search for a set of strings in a column in a pandas dataframe and replace with 1 if the strings exist and 0 if they do not.
Per the example below, this works fine on the first pass:
df = pd.DataFrame({'ID':[1,2,3,4], 'Event':['1 Day', '2 Days','3 Days','4 Days']})
df['Event'] = np.where(df['Event'].str.contains('3 Days|4 Days'),1,df['Event'])
df
but when I try and apply the opposite logic and replace the instances where the strings do not exist:
df = pd.DataFrame({'ID':[1,2,3,4], 'Event':['1 Day', '2 Days','3 Days','4 Days']})
df['Event'] = np.where(df['Event'].str.contains('3 Days|4 Days'),1,df['Event'])
df['Event'] = np.where(~df['Event'].str.contains('3 Days|4 Days'),0,df['Event'])
df
I get this error – TypeError: bad operand type for unary ~: 'float'
I tried using logical operators so the actions would occur simultaneously:
df = pd.DataFrame({'ID':[1,2,3,4], 'Event':['1 Day', '2 Days','3 Days','4 Days']})
df['Event'] = np.where(df['Event'].str.contains('3 Days|4 Days'),1,df['Event']) & np.where(~df['Event'].str.contains('3 Days|4 Days'),0,df['Event'])
df
but received this error… TypeError: unsupported operand type(s) for &: 'str' and 'int'
What I’m ultimately trying to achieve is a df that replaces all the cells where the strings exist with 1 and the instances where those strings do not exist with 0s so I can analyze. Like so:
ID Event
1 0
2 0
3 1
4 1
>Solution :
After this line:
df['Event'] = np.where(df['Event'].str.contains('3 Days|4 Days'),1, df['Event'])
df['Event'] contains 1 which is not a string, so the second time you check (inside np.where):
df['Event'].str.contains('3 Days|4 Days')
it returns:
0 False
1 False
2 NaN
3 NaN
Name: Event, dtype: object
Since NaN doesn’t evaluate ~NaN, it returns an error.
To get the desired outcome, simply use np.where once where you select 1 if True, 0 otherwise:
df['Event'] = np.where(df['Event'].str.contains('3 Days|4 Days'), 1, 0)
Output:
ID Event
0 1 0
1 2 0
2 3 1
3 4 1