Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

most efficient way to index a numpy array with a number of 1d boolean arrays

Assume that I have a numpy array A with n dimensions, which might be very large, and assume that I have k 1-dimensional boolean masks M1, ..., Mk

I would like to extract from A an n-dimensional array B which contains all the elements of A located at indices where the "outer-AND" of all the masks is True.

..but I would like to do this without first forming the (possibly very large) "outer-AND" of all the masks, and without having to extract the specified elements from each axis one axis at a time hence creating (possibly many) intermediate copies in the process.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

The example below demonstrates the two ways of extracting the elements from A just described above:

from functools import reduce
import numpy as np


m = 100

for _ in range(m):
    n = np.random.randint(0, 10)
    k = np.random.randint(0, n + 1)

    A_shape = tuple(np.random.randint(0, 10, n))

    A = np.random.uniform(-1, 1, A_shape)
    M_lst = [np.random.randint(0, 2, dim).astype(bool) for dim in A_shape]

    # creating shape of B:
    B_shape = tuple(map(np.count_nonzero, M_lst)) + A_shape[len(M_lst):]
    # size of B:
    B_size = np.prod(B_shape)

    # --- USING "OUTER-AND" OF ALL MASKS --- #
    # creating "outer-AND" of all masks:
    M = reduce(np.bitwise_and, (np.expand_dims(M, tuple(np.r_[:i, i+1:n])) for i, M in enumerate(M_lst)), True)
    # extracting elements from A and reshaping to the correct shape:
    B1 = A[M].reshape(B_shape)
    # checking that the correct number of elements was extracted
    assert B1.size == B_size
    # THE PROBLEM WITH THIS METHOD IS THE POSSIBLY VERY LARGE OUTER-AND OF ALL THE MASKS!

    # --- USING ONE MASK AT A TIME --- #
    B2 = A
    for i, M in enumerate(M_lst):
        B2 = B2[tuple(slice(None) for _ in range(i)) + (M,)]
    assert B2.size == np.prod(B_shape)
    assert B2.shape == B_shape
    # THE PROBLEM WITH THIS METHOD IS THE POSSIBLY LARGE NUMBER OF POSSIBLY LARGE INTERMEDIATE COPIES!

    assert np.all(B1 == B2)

    # EDIT 1:
    # USING np.ix_ AS SUGGESTED BY Chrysophylaxs
    i = np.ix_(*M_lst)
    B3 = A[i]
    assert B3.shape == B_shape
    assert B3.size == B_size
    assert np.prod(list(map(np.size, i))) == B_size

print(f'All three methods worked all {m} times')

Is there a smarter (more efficient) way to do this, possibly using an existing numpy function?.

EDIT 1: I added the solution suggested by Chrysophylaxs
EDIT 2: Accepted answer from Chrysophylaxs as the index used has the same size as the resulting B which is exactly what I wanted.

>Solution :

IIUC, you’re looking for np.ix_; an example:

import numpy as np

arr = np.arange(60).reshape(3, 4, 5)

x = [True, False, True]
y = [False, True, True, False]
z = [False, True, False, True, False]

out = arr[np.ix_(x, y, z)]

out:

array([[[ 6,  8],
        [11, 13]],

       [[46, 48],
        [51, 53]]])
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading