Sub-pixel convolution (also known as pixel shuffle) is a technique primarily used for image super-resolution and other upsampling tasks in deep learning. Instead of upsampling via interpolation or transposed convolution, it learns to generate a high-resolution image from a low-resolution feature map by reorganizing the channels.
🔧 Core Idea:
Given a low-resolution feature map with shape: (B, C × r², H, W)
It rearranges it to shape: (B, C, H × r, W × r)
Here:
B
= batch sizeC
= number of channels in outputH
,W
= spatial size of the feature mapr
= upscaling factor (e.g., 2, 3, 4)
🧠 Why Use It?
- Avoids checkerboard artifacts common in transposed convolutions.
- Reduces computational cost: instead of working in high-res space, the convolution is done in low-res and then upscaled.
🧮 Mathematical Overview
Suppose you want to upscale by a factor r
. Instead of upsampling directly, you:
- Use a convolution layer that outputs
C × r²
channels. - Rearrange (reshape + transpose) the channels into spatial dimensions.
Example:
For an upscaling factor r = 2
:
- Input: tensor with shape
(B, 4, H, W)
- Pixel shuffle output:
(B, 1, 2H, 2W)
Why 4? Because r² = 2² = 4
📦 PyTorch Code Example
import torch
import torch.nn as nn
# Simulate input
input = torch.randn(1, 4, 5, 5) # B=1, C=4, H=5, W=5
# Pixel shuffle
pixel_shuffle = nn.PixelShuffle(upscale_factor=2)
output = pixel_shuffle(input) # output shape will be (1, 1, 10, 10)
🧱 Block Diagram (Conceptual)
Input Feature Map: (B, C*r^2, H, W)
↓
Convolution (learns features for upsampling)
↓
PixelShuffle (rearranges channels into space)
↓
Output: (B, C, H*r, W*r)
🔁 Used In:
- ESPCN (Efficient Sub-Pixel Convolutional Neural Network)
- Real-Time Single Image Super-Resolution (Shi et al., 2016) [paper]
🆚 Compared to Transposed Convolution
Aspect | Sub-pixel Convolution | Transposed Convolution |
---|---|---|
Artifacts | Less prone to checkerboard | Can suffer from checkerboard |
Speed | Faster (low-res domain) | Slower (operates in upsampled size) |
Flexibility | Needs careful channel shaping | More flexible |