Number of different elements for each columns pair

I have a NumPy array A of shape (n, m) and dtype bool:

array([[ True, False, False],
       [ True,  True,  True],
       [False,  True,  True],
       [False,  True, False]])

I would like to get the result R of shape (m, m) of dtype int:

array([[0, 3, 2],
       [3, 0, 1],
       [2, 1, 0]])

where R[i, j] is the number of elements that are different in columns i and j. So, for example:

R[0, 0] = (A[:, 0] != A[:, 0]).sum()
R[2, 1] = (A[:, 2] != A[:, 1]).sum()
R[0, 2] = (A[:, 0] != A[:, 2]).sum()
...

Is there a way to achieve this with NumPy?

>Solution :

Yes, this is pretty straightforward with some broadcasting:

R = (A[:, None, :] != A[:, :, None]).sum(axis=0)

Leave a Reply