Zero871015
[SwinIR: Image Restoration Using Swin Transformer](https://arxiv.org/pdf/2108.10257.pdf)
Try
HackMD
Zero871015
·
Follow
Last edited by
Zero871015
on
May 5, 2022
Linked with GitHub
Contributed by
0
Comments
Feedback
Log in to edit or delete your comments and be notified of replies.
Sign up
Already have an account? Log in
There is no comment
Select some text and then click Comment, or simply add a comment to this page from below to start a discussion.
Discard
Send
SwinIR: Image Restoration Using Swin Transformer
Abstract
圖像恢復(Image restoration) 指把低畫質的圖(縮小、壓縮的圖)回到高畫質
最新的方法是基於卷積神經網路,但很少人利用 Transformer 的方法,即便他的表現很好
本文提出基於 Swin Transformer 的模型 SwinIR,用於圖像恢復
SwinIR 三架構
淺層特徵提取
深層特徵提取
圖像重建
實驗有三個
image super-resolution
image denoising
JPEG compression artifact reduction
實驗表明 SwinIR 在不同任務上性能優於先前方 0.14~0.45 dB,參數量最多減少 67%
Introduction
Image restoration,如 image super-resolution(SR)、image denoising、JPEG compression artifact reduction,目標都是把低畫質重建成高畫質
以往的工作 CNN 都是主流,雖然效能的確提高,但有兩個問題
image 和 convolution kernels 沒考慮到相關性
相同的 kernel 回復不同的圖像區域是不好的
local processing 時,卷積對 long-range dependecy 沒效率
為了替代 CNN,Transformer 設計了可以捕捉上下文互動的機制
但使用 Transformer 的圖像恢復器,input 通常有固定大小
邊界的像素不能用相鄰像素恢復
恢復的圖可能有 border artifacts
可透過 patch overlapping 修復,但會有額外 cost
Swin Transformer 有 CNN 處理大圖像的優勢,也有 Transformer 使用 shifted window 的優勢
基於 Swin Transformer 提出 SwinIR
淺層特徵提取
深層特徵提取
圖像重建
深度特徵提取模組由幾個 residual Swin Transformer blocks(RSTB) 組成,每個塊都有幾個 Swin Transformer layers 和一個 residual connection
和 CNN 的模型相比有幾個優勢
圖像內容與 weight 互動
shifted window 能捕捉 long-range 的資料
用更少參數得到更好結果
Image Not Showing
Possible Reasons
The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported
Learn More →
和其他模型相比,有更大的 PSNR
Image Not Showing
Possible Reasons
The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported
Learn More →
Image Not Showing
Possible Reasons
The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported
Learn More →
Related Work
圖像恢復
Vision Transformer
Method
Network Architecture
Image Not Showing
Possible Reasons
The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported
Learn More →
淺層提取 -> 深層提取 -> HQ 圖像重建
Shallow and deep feature extraction
給一個低畫質圖像
I
L
Q
∈
R
H
∗
W
∗
C
i
n
H W C 是 高度 寬度 通道
用 3*3 的卷基層來提取淺層特徵
F
0
=
H
S
F
(
I
L
Q
)
從 F0 提取深層特徵
F
D
F
=
H
D
F
(
F
0
)
H_DF 是深層特徵提取模組,包含 K 個 RSTB 和 1 個 3*3 卷積層
最後使用卷積層可以把卷積的 inductive bias 帶到 Transformer 中
Image reconstruction
透過前面得到的深層和淺層特徵重建高畫質圖像 I_RHQ
I
R
H
Q
=
H
R
E
C
(
F
0
+
F
D
F
)
H_REC 是重建模組
淺層特徵包含低頻,深層特徵用於恢復高頻
透過 long skip 將低頻傳輸到重建模組
重建模組使用 sub-pixel convolution layer 對特徵採樣
如果去雜訊、減少偽影等不需要採樣的任務就用一個卷積層重建
利用 residual learning 重建 LQ 和 HQ 之間的殘差而不是直接重建 HQ(下式最後的 + I_LQ)
I
R
H
Q
=
H
S
w
i
n
I
R
(
I
L
Q
)
+
I
L
Q
Loss function
τ
=
|
|
I
R
H
Q
−
I
H
Q
|
|
和原圖越像越好,差距越小越好
Residual Swin Transformer Block
參考上圖 (a)
輸出的地方會把輸入也加進來(殘差連接)
增強平移等效性(不管圖像中的目標被移動到哪裡得到的結果應該一樣)
聚合不同等級的特徵
Swin Transformer layer
基於原始 Transformer layer 的 standard multi-head selfattention,但多了 local attention 和 shifted window 機制
參考上圖 (b)
Swin Transformer 先利用切著 input 成不重疊的 M * M Windows,把大小從 input 的 H * W * C 調整成 HW/M^2 * M^2 * C
HW/M^2 是總共的 Windows 數量
接著每個區塊個別計算 standard self-attention,每個區塊得到特徵
X
∈
R
M
2
∗
C
和 query key value 矩陣 Q K V
Q
=
X
P
Q
K
=
X
P
K
V
=
X
P
V
P_Q P_K P_V 是投影矩陣,每個 Windows 共用
self-attention 的東西,應該不用說明太多
…
?
self-attention 結束最後接上 MSA,後面再來一次 self-attention,然後第二次接上 MLP
兩者前面都有 LayerNorm 做正規化
Experiments
Experimental Setup
實驗
經典圖 SR
真實世界圖 SR
去雜訊
JPEG 偽影減少
RSTB number: 6
STL number: 6
window size: 8
JPEG 偽影減少用 7,因為 8 的時候很廢,推測是因為 JPEG 剛好也是 8*8 分割
channel number: 180
attention head number: 6
Ablation Study and Discussion
Dateset
Train: DIV2K
Test: Manga109
Impact of channel number, RSTB number and STL number
Image Not Showing
Possible Reasons
The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported
Learn More →
channel number
RSTB number
STL number
選用 180 6 6 是為了顧慮到模型大小
Impact of patch size and training image number; model convergence comparison
和基於 CNN 的 RCAN 比較
Image Not Showing
Possible Reasons
The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported
Learn More →
Training patch size
Percentage of used images
>100% 的訓練資料來自 Flickr2K
Training iterations
Impact of residual connection and convolution layer in RSTB
Image Not Showing
Possible Reasons
The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported
Learn More →
RSTB 最後卷積層的重要性
用三個 3*3 可以減少參數,但性能下降
Results on Image SR Classical image SR
Classical image SR
Image Not Showing
Possible Reasons
The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported
Learn More →
SwinIR+ 表示用了 self-ensemble,對原圖水平、垂直、水平垂直反轉後結果求平均
不但效果好,參數也少
但執行時間中等
RCAN: 0.2s
IPT: 4.5s
網路超大,效果雖不錯但還是輸
SwinIR: 1.1s
Image Not Showing
Possible Reasons
The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported
Learn More →
銳利且自然
Lightweight image SR
跟小尺寸的模型比較(自身模型也有縮小)
Image Not Showing
Possible Reasons
The image file may be corrupted
The server hosting the image is unavailable
The image path is incorrect
The image format is not supported
Learn More →
在參數量中等的前提之下效果仍很好
Real-world image SR
訓練資料集不太足夠,但仍比其他自然
有夠好資料集可以更猛
Results on JPEG Compression Artifact Reduction
和 DRUNet 效果差不多,但參數量大概只有三分之一
Results on Image Denoising
同上,和 DRUNet 效果差不多,但參數量大概只有三分之一
不會有模糊感、更加銳利
Conclusion
提出基於 Swin Transformer 的圖像恢復模型 SwinIR
淺層特徵提取
深層特徵提取
HR 重建
利用 RSTB 做深度特徵提取,而每個 RSTB 由 Swin Transformer 層、卷積層和 residual connection 構成
大量實驗結果表明能在圖像恢復任務上有較好的表現
未來希望可以擴展到去模糊、去雨等任務
tags:
paper
×
Sign in
Email
Password
Forgot password
or
By clicking below, you agree to our
terms of service
.
Sign in via Facebook
Sign in via Twitter
Sign in via GitHub
Sign in via Dropbox
Sign in with Wallet
Wallet (
)
Connect another wallet
New to HackMD?
Sign up
Comment