論文翻譯
deeplearning
Normalization
區塊如下分類,原文區塊為藍底,翻譯區塊為綠底,部份專業用語翻譯參考國家教育研究院
原文
翻譯
個人註解,任何的翻譯不通暢部份都請留言指導
It this paper we revisit the fast stylization method introduced in Ulyanov et al. (2016). We show how a small change in the stylization architecture results in a significant qualitative improvement in the generated images. The change is limited to swapping batch normalization with instance normalization, and to apply the latter both at training and testing times. The resulting method can be used to train high-performance architectures for real-time image generation. The code is available at https://github.com/DmitryUlyanov/texture_nets. Full paper can be found at https://arxiv.org/abs/1701.02096.
這邊論文中,我們快速的重溫Ulyanov等人所提出的快速風格化的方法。我們會說明在風格化架構中一個小小的變化如何造成所生成的照片的品質明顯的提升。這個小小的變化就只是將batch normalization轉成instance normalization,而且在訓練跟測試階段都會使用instance normalization。這個方法可以用來訓練實時影像生成的高效能架構。程式碼跟論文都可以參考上述連結。
The recent work of Gatys et al. (2016) introduced a method for transferring a style from an image onto another one, as demonstrated in fig. 1. The stylized image matches simultaneously selected statistics of the style image and of the content image. Both style and content statistics are obtained from a deep convolutional network pre-trained for image classification. The style statistics are extracted from shallower layers and averaged across spatial locations whereas the content statistics are extracted form deeper layers and preserve spatial information. In this manner, the style statistics capture the “texture” of the style image whereas the content statistics capture the “structure” of the content image.
Gatys等人近來的研究提出一種用於從一張照片將風格轉移至另一張照片的方法,如fig. 1所示。風格化(stylized)後的照片同時匹配所選擇的風格照片與內容照片的統計資訊。風格跟內容的統計資訊都是從預訓練用於影像分類的深度卷積網路。風格統計資訊的提取是從較淺的網路層處理,並平均其空間位置,而內容統計資訊則是從較深的網路層提取,並保留其空間信息。這樣的方法,其風格統計資訊捕捉風格照片的"紋理",而內容統計資訊則是捕捉內容照片的"結構"。
Although the method of Gatys et. al produces remarkably good results, it is computationally inefficient. The stylized image is, in fact, obtained by iterative optimization until it matches the desired statistics. In practice, it takes several minutes to stylize an image of size 512 × 512. Two recent works, Ulyanov et al. (2016) Johnson et al. (2016), sought to address this problem by learning equivalent feed-forward generator networks that can generate the stylized image in a single pass. These two methods differ mainly by the details of the generator architecture and produce results of a comparable quality; however, neither achieved as good results as the slower optimization-based method of Gatys et. al.
雖然Gatys等人的方法能夠產生不哩好的結果,不過其計算是沒有效率的。事實上,風格化後的照片是經過不斷迭代最佳化一直到匹配到期望的統計資訊。實務上,要處理一張512x512的照片是需要花好幾分鐘的。兩項近來的研究,Ulyanov與Johnson兩派人馬嚐試透過學習等效的前饋生成器網路以單一次推論就生成風格化後的照片來解決這個問題。這兩種方法最主要的差異在於生成器架構的細節,並產生一個可比較的品質;不過吼,這兩種架構都沒有Gatys這種基於最佳化慢慢來的方法好。
In this paper we revisit the method for feed-forward stylization of Ulyanov et al. (2016) and show that a small change in a generator architecture leads to much improved results. The results are in fact of comparable quality as the slow optimization method of Gatys et al. but can be obtained in real time on standard GPU hardware. The key idea (section 2) is to replace batch normalization layers in the generator architecture with instance normalization layers, and to keep them at test time (as opposed to freeze and simplify them out as done for batch normalization). Intuitively, the normalization process allows to remove instance-specific contrast information from the content image, which simplifies generation. In practice, this results in vastly improved images (section 3).
這篇論文中,我們重溫了Ulyanov等人的feed-forward stylization的方法,然後說明生成器架構中一個小小的改變所導致的大大提升的結果。事實上這個結果跟Gatys等人的慢慢來最佳化方法是有得比的,而且在標準的GPU上還可以實時獲得結果。主要的概念(section 2)就是以instance normalization layers取代掉生成器架構中的batch normalization layer,並且在測試的時候仍然保留這個操作(而不是像batch normalization那樣在測試的時候會凍結這個layer)。直觀來看,正規化的過程能夠從內容照片中移除特定實例的對比信息(instance-specific contrast information)。實務上,這個結果大大的提升照片(section 3)。
The work of Ulyanov et al. (2016) showed that it is possible to learn a generator network
Ulyanov等人的研究說明著,可以學習一個生成器網路
The function
where
函數
其中
While the generator network
儘管生成器網路
Figure 3: Row 1: content image (left), style image (middle) and style transfer using method of Gatys et. al (right). Row 2: typical stylization results when trained for a large number of iterations using fast stylization method from Ulyanov et al. (2016): with zero padding (left), with a better padding technique (middle), with zero padding and instance normalization (right).
Figure 3:Row 1:內容照片(左),風格照片(中間),以及使用Gatys等人的風格轉移方法(右邊)。Row 2:使用Ulyanov等人的快速風格化方法在多次迭代訓練後常見的結果:使用zero padding(左),使用較好的padding技術(中),使用zero padding跟instance normalization(右)。
A simple observation is that the result of stylization should not, in general, depend on the contrast of the content image (see fig. 2). In fact, the style loss is designed to transfer elements from a style image to the content image such that the contrast of the stylized image is similar to the contrast of the style image. Thus, the generator network should discard contrast information in the content image. The question is whether contrast normalization can be implemented efficiently by combining standard CNN building blocks or whether, instead, is best implemented directly in the architecture.
一個簡單的觀察就是,通常,風格化的結果不應該取決於內容照片的對比(見fig. 2)。事實上,style loss主要是從風格照片轉移元素到內容照片,這讓風格化後照片的對比度類似於風格照片的對比度。問題是,是不是能夠透過結合標準CNN的建構模塊高效地處理對比度正規化(contrast normalization),又或者,在架構中直接實現。
Figure 2: A contrast of a stylized image is mostly determined by a contrast of a style image and almost independent of a content image contrast. The stylization is performed with method of Gatys et al. (2016).
Figure 2:風格化照片的對比度主要由風格照片的對比決定,幾乎跟內容照片的對比度無關。這是以Gatys等人的方法處理的風格化。
The generators used in Ulyanov et al. (2016) and Johnson et al. (2016) use convolution, pooling, upsampling, and batch normalization. In practice, it may be difficult to learn a highly nonlinear contrast normalization function as a combination of such layers. To see why, let
It is unclear how such as function could be implemented as a sequence of ReLU and convolution operator.
Ulyanov等人跟Johnson等人所使用的生成器中採用convolution、pooling、upsampling與batch normalization。實務上,把這些網路層結合而成的highly nonlinear contrast normalization function是很難學習的。我們來看看為什麼,假設
目前還不清楚這樣的函數要如何被實現為一系列的ReLU與卷積操作。
On the other hand, the generator network of Ulyanov et al. (2016) does contain a normalization layers, and precisely batch normalization ones. The key difference between eq. (1) and batch normalization is that the latter applies the normalization to a whole batch of images instead for single ones:
另一方面,Ulyanov等人的生成器網路確實包含了一個正規化網路層,精確的說是batch normalization。跟方程式(1)之間的關鍵差異在於,bn是整批正規化,而不是單一張照片:
In order to combine the effects of instance-specific normalization and batch normalization, we propose to replace the latter by the instance normalization (also known as “contrast normalization”) layer:
為了能夠結合instance-specific normalization與batch normalization的效果,我們提出用instance normalization(又稱contrast normalization)來取代bn:
We replace batch normalization with instance normalization everywhere in the generator network
我們把生成器網路
In this section, we evaluate the effect of the modification proposed in section 2 and replace batch normalization with instance normalization. We tested both generator architectures described in Ulyanov et al. (2016) and Johnson et al. (2016) in order to see whether the modification applies to different architectures. While we did not have access to the original network by Johnson et al. (2016), we carefully reproduced their model from the description in the paper. Ultimately, we found that both generator networks have similar performance and shortcomings (fig. 5 first row).
這個章節中,我們評估了section 2中提出的調整效果,並且用instance normalization取代掉batch normalization。為了看看這種調整是不是能適用於不同架構上,我們測試了Ulyanov等人與Johnson等人所提出的兩個生成器架構。儘管我們無法取得Johnson等人的原始網路,我們還是仔細小心滴從他們論文中的描述來重建。最終,我們發現到,兩個生成器網路有著類似的效能與缺點。(fig 5. first row)
Figure 5: Qualitative comparison of generators proposed in Ulyanov et al. (2016) (left), Johnson et al. (2016) (right) with batch normalization (first row) and instance normalization (second row). Both architectures benefit from instance normalization.
Next, the replaced batch normalization with instance normalization and retrained the generators using the same hyperparameters. We found that both architectures significantly improved by the use of instance normalization (fig. 5 second row). The quality of both generators is similar, but we found the residuals architecture of Johnson et al. (2016) to be somewhat more efficient and easy to use, so we adopted it for the results shown in fig. 4.
然後,用instance normalization取代掉batch normalization,並且以相同的超參數重新訓練。我們發現到,兩個架構在使用instance normalization的情況下都有著明顯的提升(見fig. 5 second row)。兩個生成器的品質類似,不過我們發現到,Johnson等人的殘差架構更高效而且容易使用,所以我們用它來做結果的呈現(fig. 4)
Figure 4: Stylization examples using proposed method. First row: style images; second row: original image and its stylized versions.
In this short note, we demonstrate that by replacing batch normalization with instance normalization it is possible to dramatically improve the performance of certain deep neural networks for image generation. The result is suggestive, and we are currently experimenting with similar ideas for image discrimination tasks as well.
在這簡短的論文中,我們證明了,透過以instance normalization來取代batch normalization能夠強烈提升某些用於影像生成的深度神經網路。這個結果是有啟發性的,我們目前也在為影像分類任務做類似的嚐試實驗。
Figure 6: Processing a content image from fig. 4 with Delaunay style at different resolutions: 512 (left) and 1080 (right).