In what logic are multidimensional tensors (e.g. in pytorch) combined?
Let’s say I have tensor A with shape (64, 1, 1, 42) and tensor B with shape (1, 42, 42).
What is the result of A & B and how can I determine the resulting shape (if possible) in advance?
The combination of tensors of different shapes is often accomplished by a mechanism called Broadcasting. Broadcasting is a powerful tool in libraries like NumPy and PyTorch, which helps to expand the size of tensors without actually copying the data, thus performing operations between tensors of different shapes. Broadcast rules are generally consistent between NumPy and PyTorch.
If the number of dimensions of the tensors doesn’t match, prepend 1 to the dimensions of the tensor with fewer dimensions.
Compare the size of each dimension:
If the size matches or one of the sizes is 1, then broadcasting is possible for that dimension.
If neither size is 1 and the sizes don’t match, broadcasting fails.
After successful broadcasting, each tensor behaves as if its shape were the element-wise maximum of the shapes of the two input tensors.
In any dimension where one tensor has size 1, and the other tensor has a size greater than 1, the smaller tensor behaves as if it had been expanded to match the size of the larger tensor. For example:
Tensor A of shape (64, 1, 1, 42)
Tensor B of shape (1, 42, 42)
To combine them:
Make the number of dimensions equal:
Tensor A: (64, 1, 1, 42)
Tensor B: (1, 1, 42, 42)
Compare the dimensions:
The dimensions are (64, 1, 1, 42) and (1, 1, 42, 42)
Every dimension is either the same or one of them is 1, so broadcasting is possible. The resulting tensor will have the shape: (64, 1, 42, 42)
import torch # Creating example tensors A = torch.rand((64, 1, 1, 42)) B = torch.rand((1, 42, 42)) # Broadcasting operation result = A * B # Outputting the resulting shape print(result.shape)
which results in
torch.Size([64, 1, 42, 42])