Perceptual Loss(Fei Fei Li)

# Perceptual Loss(Fei Fei Li) ###### tags: `Super Resolution` > This is a summary of the relevant information in the paper "Perceptual Losses for Real-Time Style Transfer and Super-Resolution" - Auto-generated Table of Content [ToC] ## :memo: Perceptual Losses Use of perceptual loss functions allows the transfer of semantic knowledge from the loss network(usually VGG) to the transformation network. #### Paper Link: https://arxiv.org/pdf/1603.08155.pdf #### Style Transfer Combines the content of one image with the style of another by jointly minimizing the feature reconstruction loss and a style reconstruction loss also based on our objective depends on the high level features extracted by the network. ### Network Architecture ![](https://i.imgur.com/aK3RbSW.jpg) The network is composed of two parts: - Transformation Network ($F_w$): For Image Transformation. - Loss Network ($\phi$): Used to define different Losses >The key insight of these methods is that convolutional neural networks pretrained for image classification have already learned to encode the perceptual and semantic information we would like to measure in our loss functions. We therefore make use of a network φ which as been pretrained for image classification as a fixed loss network in order to define our loss functions. Our deep convolutional transformation network is then trained using loss functions that are also deep convolutional networks. ### Transformation Network This network follows architectural guidelines proposed by Radford. - No pooling layers, instead use strided and fractionally strided convolutions for in network downsampling and upsampling. - Network body consists of five residual blocks. - All non-residual convolutional layers are followed by spatial batch normalization and ReLU nonlinearities with the exception of the output layer, which instead uses a scaled tanh to ensure that the output image has pixels in the range [0, 255]. - Other than the first and last layers which use 9 × 9 kernels, all convolutional layers use 3 × 3 kernels. Inputs are colored images without any normalization. Patch size 3 × 256 × 256. #### Upsampling Downsampling For super-resolution with an upsampling factor of f, we use several residual blocks followed by $log_{2}f$ convolutional layers with stride 1/2. This is different from who use bicubic interpolation to upsample the low-resolution input before passing it to the network. Rather than relying on a fixed upsampling function, fractionally-strided convolution allows the upsampling function to be learned jointly with the rest of the network. #### Benefits to networks that downsample and then upsample. 1.The first is computational. With a naive implementation, a 3×3 convolution with C filters on an input of size C × H × W requires 9HWC*C multiply-adds, which is the same cost as a 3 × 3 convolution with DC filters on an input of shape DC × H/D × W/D. After downsampling, we can therefore use a larger network for the same computational cost. 2.Without downsampling, each additional 3×3 convo- lutional layer increases the effective receptive field size by 2. After downsampling by a factor of D, each 3×3 convolution instead increases effective receptive field size by 2D, giving larger effective receptive fields with the same number of layers. ### Perceptual Loss Functions Basically L2 Norm between the features extracted by VGG for both ground truth and the generated image by the network. - Finding an image y that minimizes the feature reconstruction loss for early layers tends to produce images that are visually indistinguishable from y. As we reconstruct from higher layers, image content and overall spatial structure are preserved but color, texture, and exact shape are not. ### Experiments - Perceptual loss is calculated for relu2_2 for VGG16 network. - Training with 288×288 patches from 10k images from the MS-COCO training set, - As a post-processing step, they perform histogram matching between our network output and the low-resolution input. ### Results Model trained for feature reconstruction does a very good job at reconstructing sharp edges and fine details. The feature reconstruction loss gives rise to a slight cross-hatch pattern visible under magnification, which harms its PSNR and SSIM compared to baseline methods.The pixel loss gives fewer visual artifacts and higher PSNR values but the perceptual loss does a better job at reconstructing fine details, leading to pleasing visual results.

Syntax	Example	Reference
# Header	Header	基本排版
- Unordered List	Unordered List
1. Ordered List	Ordered List
- [ ] Todo List	Todo List
> Blockquote	Blockquote
Bold font	Bold font
Italics font	Italics font
~~Strikethrough~~	~~Strikethrough~~
19^th^	19^th
H~2~O	H₂O
++Inserted text++	Inserted text
==Marked text==	Marked text
[link text](https:// "title")	Link
![image alt](https:// "title")	Image
`Code`	`Code`	在筆記中貼入程式碼
```javascript var i = 0; ```	`var i = 0;`
:smile:		Emoji list
{%youtube youtube_id %}	Externals
$L^aT_eX$	L^aT_eX
:::info This is a alert area. :::	This is a alert area.