Fast boolean interaction matrix with Numpy

January 23, 2024

I have 2 integer vectors, one very long document (1E5 to 1E7 elements), and one rather short query (typically 5-8 elements). I want to create a 2D boolean matrix (i, j) that puts 1 where document[i] == query[j], otherwise 0. For example:

Document \ Query	5	2	4
2	0	1	0
8	0	0	0
3	0	0	0
4	0	0	1
…

Is there a fast way to do it with Numpy, that is without a Python loop ? (Using Pandas here is not an option)

>Solution :

You can just take advantage of numpy broadcasting:

import numpy as np

document = np.random.randint(0, 5, (10,))
query = np.array([2, 3, 5])

(document[:, np.newaxis] == query)

Which prints:

array([[False, False, False],
       [False, False, False],
       [False,  True, False],
       [False, False, False],
       [False, False, False],
       [False,  True, False],
       [ True, False, False],
       [False, False, False],
       [False, False, False],
       [ True, False, False]])

I hope this answers your question!