I’m currently working on computing various similarity metrics between vectors such as cosine similarity, euclidean distance, mahalanobis distance, etc. As I’m working with vectors that can be very large, I need compute time to be minimal.

I’m struggling to understand how to work with vectors of different dimensions (they, do, however, share one dimension) and how to work with this in PyTorch.

I have two vectors, **A** and **B** with dimensions `[867, 768]`

and `[621, 768]`

, respectively.

I am trying to compute the following:

- For each
**v_a**of the 867 vectors in**A**, - Subtract
**v_a – v_b**for each of the 621 vectors in**B**

I’m aware that this is achieveable under the hood with the likes of scipy and numpy but I’m trying to avoid detaching and moving the vectors to the CPU for speed.

Can someone help me understand the logic of the operators required in PyTorch to achieve this?

### >Solution :

You could use fancy indexing on both input tensors to unsqueeze a dimension such that `A`

and `B`

have a shape of `(1, 867, 768)`

and `(621, 1, 768)`

respectively. The subtraction operation will then automatically broadcast the two tensors to identical shapes.

```
>>> diff = A[None]-B[:,None]
>>> diff.shape
torch.Size([621, 867, 768])
```

This is the typical approach when implementing batched-pairwise distances.

More specifically, notice the difference between the two variants: `A[None]-B[:,None]`

and `A[:,None]-B[None]`

.

```
diff = A[None]-B[:,None] # (1, k_a, b) - (k_b, 1, b) -> (k_b, k_a, b) - (k_b, k_a, b)
diff.shape # (k_b, k_a, b)
```

Compared to:

```
diff = A[:,None]-B[None] # (k_a, 1, b) - (1, k_b, b) -> (k_a, k_b, b) - (k_a, k_b, b)
diff.shape # (k_a, k_b, b)
```

*You can read more about broadcasting on the NumPy documentation page.*