I have a list of strings that I want to filter:
A = ['enc_1', 'enc_2', 'enc_lag', 'lag_1', 'lag_2', 'price', 'price_std']
If I need strings that contain EITHER ‘enc’ OR ‘lag’, I can do the following:
[_ for _ in A if ('enc' in _) or ('lag' in _)]
Output: ['enc_1', 'enc_2', 'enc_lag', 'lag_1', 'lag_2']
Everything is fine. However, if I need strings that contain NEITHER ‘enc’ NOR ‘lag’, a seemingly obvious solution doesn’t work:
[_ for _ in A if ('enc' not in _) or ('lag' not in _)]
Output: ['enc_1', 'enc_2', 'lag_1', 'lag_2', 'price', 'price_std']
Judging by the result, I would expect an expression with AND to produce such an output (‘enc_lag’ would be removed), but for whatever reason OR does it instead. I am starting deeply questioning my understanding of OR and AND operators… Any help is appreciated!
>Solution :
What you actually want is and here. If the element must contain neither 'enc' nor 'lag', then it must not contain 'enc' AND must not contain 'lag'.
[_ for _ in A if ('enc' not in _) and ('lag' not in _)]
Alternatively, by applying De Morgan’s law, we have:
[_ for _ in A if not (('enc' in _) or ('lag' in _))]