Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Pandas/Numpy Multiple if statement with and/or operators

I have quite a complex If statement that I would like to add as a column in my pandas dataframe. In the past I’ve always used numpy.select for this type of problem, however I wouldn’t know how to achieve that with a multi-line if statement.

I was able to get this in Excel:

=IF(sum1=3,IF(AND(col1=col2,col2=col3),0,1),IF(sum1=2,IF(OR(col1=col2,col2=col3,col1=col3),0,1),IF(sum1=1,0,1)))

and write it in Python just as a regular multi-line ‘if statement’, just want to find out if there is a far cleaner way of presenting this.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

if df['sum1'] == 3:
  if df['col1'] == df['col2'] and df['col2'] == df['col3']:
    df['verify_col'] = 0
  else:
    df['verify_col'] = 1
elif df['sum1'] == 2:
  if df['col1'] == df['col2'] or df['col2'] == df['col3'] or df['col1'] == df['col3']:
    df['verify_col'] = 0
  else:
    df['verify_col'] = 1
elif df['sum1'] == 1:
  df['verify_col'] = 0
else:
  df['verify_col'] = 1

Here’s some sample data:

df = pd.DataFrame({
    'col1': ['BMW', 'Mercedes Benz', 'Lamborghini', 'Ferrari', null],
    'col2': ['BMW', 'Mercedes Benz', null, null, 'Tesla'],
    'col3': ['BMW', 'Mercedes', 'Lamborghini', null, 'Tesla_'],
    'sum1': [3, 3, 2, 1, 2]
})

I want a column which has the following results:

'verify_col': [0, 1, 0, 0, 1]

It basically checks whether the columns match for those that have values in them and assigns a 1 or a 0 for each row. 1 meaning they are different, 0 meaning zero difference.

>Solution :

Use numpy.where with chain mask with | for bitwise OR – if no match any conditions is created 1:

m1 = (df['sum1'] == 3)
m2 = (df['col1'] == df['col2']) & (df['col2'] == df['col3'])
m3 = (df['sum1'] == 2)
m4 = (df['col1'] == df['col2']) | (df['col2'] == df['col3']) | (df['col1'] == df['col3'])
m5 = df['sum1'] == 1

df['verify_col'] = np.where((m1 & m2) | (m3 & m4) | m5, 0, 1)

If need None if no match any conditions:

df['verify_col'] = np.select([(m1 & m2) | (m3 & m4) | m5,
                              (m1 & ~m2) | (m3 & ~m4) | ~m5], 
                             [0,1], default=None)


print (df)
            col1           col2         col3  sum1  verify_col
0            BMW            BMW          BMW     3           0
1  Mercedes Benz  Mercedes Benz     Mercedes     3           1
2    Lamborghini            NaN  Lamborghini     2           0
3        Ferrari            NaN          NaN     1           0
4            NaN          Tesla       Tesla_     2           1
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading