I have 2 integer vectors, one very long document (1E5 to 1E7 elements), and one rather short query (typically 5-8 elements). I want to create a 2D boolean matrix (i, j) that puts 1 where document[i] == query[j], otherwise 0. For example:
| Document \ Query | 5 | 2 | 4 |
|---|---|---|---|
| 2 | 0 | 1 | 0 |
| 8 | 0 | 0 | 0 |
| 3 | 0 | 0 | 0 |
| 4 | 0 | 0 | 1 |
| … |
Is there a fast way to do it with Numpy, that is without a Python loop ? (Using Pandas here is not an option)
>Solution :
You can just take advantage of numpy broadcasting:
import numpy as np
document = np.random.randint(0, 5, (10,))
query = np.array([2, 3, 5])
(document[:, np.newaxis] == query)
Which prints:
array([[False, False, False],
[False, False, False],
[False, True, False],
[False, False, False],
[False, False, False],
[False, True, False],
[ True, False, False],
[False, False, False],
[False, False, False],
[ True, False, False]])
I hope this answers your question!