Understanding broadcasting and arithmetic operations on different dimension tensors

I’m currently working on computing various similarity metrics between vectors such as cosine similarity, euclidean distance, mahalanobis distance, etc. As I’m working with vectors that can be very large, I need compute time to be minimal.

I’m struggling to understand how to work with vectors of different dimensions (they, do, however, share one dimension) and how to work with this in PyTorch.

I have two vectors, A and B with dimensions [867, 768] and [621, 768], respectively.

I am trying to compute the following:

  • For each v_a of the 867 vectors in A,
  • Subtract v_a – v_b for each of the 621 vectors in B

I’m aware that this is achieveable under the hood with the likes of scipy and numpy but I’m trying to avoid detaching and moving the vectors to the CPU for speed.

Can someone help me understand the logic of the operators required in PyTorch to achieve this?

>Solution :

You could use fancy indexing on both input tensors to unsqueeze a dimension such that A and B have a shape of (1, 867, 768) and (621, 1, 768) respectively. The subtraction operation will then automatically broadcast the two tensors to identical shapes.

>>> diff = A[None]-B[:,None]
>>> diff.shape
torch.Size([621, 867, 768])

This is the typical approach when implementing batched-pairwise distances.

More specifically, notice the difference between the two variants: A[None]-B[:,None] and A[:,None]-B[None].

diff = A[None]-B[:,None]    # (1, k_a, b) - (k_b, 1, b) -> (k_b, k_a, b) - (k_b, k_a, b)
diff.shape                  # (k_b, k_a, b)

Compared to:

diff = A[:,None]-B[None]    # (k_a, 1, b) - (1, k_b, b) -> (k_a, k_b, b) - (k_a, k_b, b)
diff.shape                  # (k_a, k_b, b)

You can read more about broadcasting on the NumPy documentation page.

Leave a Reply