Learning Muse by Mehdi Seyfi

Sub-pixel convolution (also known as pixel shuffle) is a technique primarily used for image super-resolution and other upsampling tasks in deep learning. Instead of upsampling via interpolation or transposed convolution, it learns to generate a high-resolution image from a low-resolution feature map by reorganizing the channels.

🔧 Core Idea:

Given a low-resolution feature map with shape: (B, C × r², H, W)

It rearranges it to shape: (B, C, H × r, W × r)

Here:

B = batch size
C = number of channels in output
H, W = spatial size of the feature map
r = upscaling factor (e.g., 2, 3, 4)

🧠 Why Use It?

Avoids checkerboard artifacts common in transposed convolutions.
Reduces computational cost: instead of working in high-res space, the convolution is done in low-res and then upscaled.

🧮 Mathematical Overview

Suppose you want to upscale by a factor r. Instead of upsampling directly, you:

Use a convolution layer that outputs C × r² channels.
Rearrange (reshape + transpose) the channels into spatial dimensions.

Example:

For an upscaling factor r = 2:

Input: tensor with shape (B, 4, H, W)
Pixel shuffle output: (B, 1, 2H, 2W)

Why 4? Because r² = 2² = 4

📦 PyTorch Code Example

import torch
import torch.nn as nn

# Simulate input
input = torch.randn(1, 4, 5, 5)  # B=1, C=4, H=5, W=5

# Pixel shuffle
pixel_shuffle = nn.PixelShuffle(upscale_factor=2)
output = pixel_shuffle(input)  # output shape will be (1, 1, 10, 10)

🧱 Block Diagram (Conceptual)

Input Feature Map: (B, C*r^2, H, W)
     ↓
Convolution (learns features for upsampling)
     ↓
PixelShuffle (rearranges channels into space)
     ↓
Output: (B, C, H*r, W*r)

🔁 Used In:

ESPCN (Efficient Sub-Pixel Convolutional Neural Network)
Real-Time Single Image Super-Resolution (Shi et al., 2016) [paper]

🆚 Compared to Transposed Convolution

Aspect	Sub-pixel Convolution	Transposed Convolution
Artifacts	Less prone to checkerboard	Can suffer from checkerboard
Speed	Faster (low-res domain)	Slower (operates in upsampled size)
Flexibility	Needs careful channel shaping	More flexible