I have a 4D tensor of (2,1024,4,6). I want to use transposed convolution for upsampling spatial dimensions of such tensor by factor of two and reducing the channel numbers from 1024 to the 512. I want to have a 4D tensor like this (2,512,8,12). How can I do that? Also, is the transposed convolution a good idea for reducing the channel numbers? For example I used the following script but it is not working:
nn.ConvTranspose3d(in_channels=1024, out_channels=512, kernel_size=(1,2,2), stride=(1,3,2), padding=(0,1,1))
>Solution :
It seems you should be using ConvTranspose2d instead of ConvTranspose3d since your input tensor is 4D, shaped NCHW.
There are different ways of getting to these results but one straightforward approach is to use a kernel size of 2 with a matching stride:
>>> conv = nn.ConvTranspose2d(1024, 512, kernel_size=2, stride=2)
Here is an inference example:
>>> conv(torch.rand(2, 1024, 4, 6)).shape
torch.Size([2, 512, 8, 12])