Is it possible to find the 0th index-position of a 2D numpy array (not) containing a given vaule?
What I want, and expect
I have a 2D numpy array containing integers. My goal is to find the index of the array(s) that do not contain a given value (using numpy functions). Here is an example of such an array, named ortho_disc:
>>> ortho_disc
Out: [[1 1 1 0 0 0 0 0 0]
[1 0 1 1 0 0 0 0 0]
[0 0 0 0 0 0 2 2 0]]
If I wish to find the arrays not containing 2, I would expect an output of [0, 1], as the first and second array of ortho_disc does not contain the value 2.
What I have tried
I have looked into np.argwhere, np.nonzero, np.isin and np.where without expected results. My best attempt using np.where was the following:
>>> np.where(2 not in ortho_disc, [True]*3, [False]*3)
Out: [False False False]
But it does not return the expected [True, True, False]. This is especially weird after we look at the output ortho_disc‘s arrays evaluated by themselves:
>>> 2 not in ortho_disc[0]
Out: True
>>> 2 not in ortho_disc[1]
Out:True
>>> 2 not in ortho_disc[2]
Out: False
Using argwhere
Using np.argwhere, all I get is an empty array (not the expected [0, 1]):
>>> np.argwhere(2 not in ortho_disc)
Out: []
I suspect this is because numpy first flattens ortho_disc, then checks the truth-value of 2 not in ortho_disc?
The same empty array is returned using np.nonzero(2 not in ortho_disc).
My code
import numpy as np
ortho_disc = np.array([[1, 1, 1, 0, 0, 0, 0, 0, 0],
[1, 0, 1, 1, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 2, 2, 0,]])
polymer = 2
print(f'>>> ortho_disc \nOut:\n{ortho_disc}\n')
print(f'>>> {polymer} not in {ortho_disc[0]} \nOut: {polymer not in ortho_disc[0]}\n')
print(f'>>> {polymer} not in {ortho_disc[1]} \nOut: {polymer not in ortho_disc[1]}\n')
print(f'>>> {polymer} not in {ortho_disc[2]} \nOut: {polymer not in ortho_disc[2]}\n\n')
breakpoint = np.argwhere(polymer not in ortho_disc)
print(f'>>>np.argwhere({polymer} not in ortho_disc) \nOut: {breakpoint}\n\n\n')
Output:
>>> ortho_disc
Out:
[[1 1 1 0 0 0 0 0 0]
[1 0 1 1 0 0 0 0 0]
[0 0 0 0 0 0 2 2 0]]
>>> 2 not in [1 1 1 0 0 0 0 0 0]
Out: True
>>> 2 not in [1 0 1 1 0 0 0 0 0]
Out: True
>>> 2 not in [0 0 0 0 0 0 2 2 0]
Out: False
>>>np.argwhere(2 not in ortho_disc)
Out: []
Expected output
From the bottom two lines:
breakpoint = np.argwhere(polymer not in ortho_disc)
print(f'>>>np.argwhere({polymer} not in ortho_disc) \nOut: {breakpoint}\n\n\n')
I excpect the following output:
>>>np.argwhere(2 not in ortho_disc)
Out: [0, 1]
Summary
I would really love feedback on how to solve this issue, as I have been scratching my head over what seems to be an easy problem for ages. And as I mentioned it is important to avoid the obvious ‘easy-way-out’ loop over ortho_disc, preferably using numpy.
Thanks in advance!
>Solution :
In [13]: ortho_disc
Out[13]:
array([[1, 1, 1, 0, 0, 0, 0, 0, 0],
[1, 0, 1, 1, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 2, 2, 0]])
In [14]: polymer = 2
In [15]: (ortho_disc != polymer).all(axis=1).nonzero()[0]
Out[15]: array([0, 1])
Breaking it down: ortho_disc != polymer is an array of bools:
In [16]: ortho_disc != polymer
Out[16]:
array([[ True, True, True, True, True, True, True, True, True],
[ True, True, True, True, True, True, True, True, True],
[ True, True, True, True, True, True, False, False, True]])
We want the rows that are all True; for that, we can apply the all() method along axis 1 (i.e. along the rows):
In [17]: (ortho_disc != polymer).all(axis=1)
Out[17]: array([ True, True, False])
That’s the boolean mask for the rows that do not contain polymer.
Use nonzero() to find the indices of the values that are not 0 (True is considered nonzero, False is 0):
In [19]: (ortho_disc != polymer).all(axis=1).nonzero()
Out[19]: (array([0, 1]),)
Note that nonzero() returned a tuple with length 1; in general, it returns a tuple with the same length as the number of dimensions of the array. Here the input array is 1-d. Pull out the desired result from the tuple by indexing with [0]:
In [20]: (ortho_disc != polymer).all(axis=1).nonzero()[0]
Out[20]: array([0, 1])