###### tags: `tensorflow` `autoencoder`
# 1. AutoEncoder訓練mnist
## 原程式碼
參考於[莉森揪的iT邦文章](https://ithelp.ithome.com.tw/articles/10207148)以及官方之[Keras官方教學](https://blog.keras.io/building-autoencoders-in-keras.html)
```python=
import tensorflow as tf
import numpy as np
from matplotlib import pyplot as plt
from tensorflow.keras.layers import Input, Conv2D, MaxPooling2D, UpSampling2D
from tensorflow.keras.datasets import mnist
def load_data():
(x_train, y_train), (x_test, y_test) = mnist.load_data()
x_train = x_train.astype("float32") / 255
x_test = x_test.astype("float32") / 255
x_train = np.reshape(x_train, (len(x_train), 28, 28, 1))
x_test = np.reshape(x_test, (len(x_test), 28, 28, 1))
# 由於在autoencoder中是用x_train(label)對x_train_noisy(input)訓練
# 和一般fully connect的y_train(label)對x_train(input)訓練不同
# 因此此處就不特別處理y的資料了
return x_train, x_test
def noise(x_train, x_test):
noise_factor = 0.5
x_train_noisy = x_train + noise_factor * np.random.normal(0.0, 1.0, size=x_train.shape)
x_test_noisy = x_test + noise_factor * np.random.normal(0.0, 1.0, size=x_test.shape)
x_train_noisy = np.clip(x_train_noisy, 0.0, 1.0)
x_test_noisy = np.clip(x_test_noisy, 0.0, 1.0)
return x_train_noisy, x_test_noisy
def train_model():
input_img = Input(shape=(28, 28, 1)) # N*28*28*1
x = Conv2D(16, (3, 3), activation='relu', padding='same')(input_img) # N*28*28*16
x = MaxPooling2D((2, 2), padding='same')(x) # N*14*14*16
x = Conv2D(8, (3, 3), activation='relu', padding='same')(x) # N*14*14*8
x = MaxPooling2D((2, 2), padding='same')(x) # N*7*7*8
x = Conv2D(8, (3, 3), activation='relu', padding='same')(x) # N*7*7*8
encoded = MaxPooling2D((2, 2), padding='same', name='encoder')(x) # N*4*4*8
x = Conv2D(8, (3, 3), activation='relu', padding='same')(encoded) # N*4*4*8
x = UpSampling2D((2, 2))(x) # N*8*8*8
x = Conv2D(8, (3, 3), activation='relu', padding='same')(x) # N*8*8*8
x = UpSampling2D((2, 2))(x) # N*16*16*8
x = Conv2D(16, (3, 3), activation='relu')(x) # N*14*14*16
x = UpSampling2D((2, 2))(x) # N*28*28*16
decoded = Conv2D(1, (3, 3), activation='sigmoid', padding='same')(x) # N*28*28*1
autoencoder = tf.keras.Model(input_img, decoded)
autoencoder.compile(optimizer="adam",
loss="binary_crossentropy",
metrics=["accuracy"])
autoencoder.fit(x_train_noisy, x_train,
epochs=50,
batch_size=128,
shuffle=True
)
autoencoder.save('autoencoder.h5')
def plot():
# 訓練完的資料皆為 60000*28*28*1
# 因此要指定第一維的batch index並reshape成(28, 28),消除channel的維度
input_image = np.reshape(x_test_noisy[5], (28, 28))
output_image = np.reshape(denoised_images[5], (28, 28))
stack_image = np.hstack((input_image, output_image))
plt.imshow(stack_image, cmap="gray")
plt.show()
# ---------------------------------------------------------------------- #
TRAIN_ON = False
x_train, x_test = load_data()
x_train_noisy, x_test_noisy = noise(x_train, x_test)
if TRAIN_ON:
train_model()
else:
autoencoder = tf.keras.models.load_model("autoencoder.h5")
denoised_images = autoencoder.predict(x_test_noisy)
plot()
```
## 分段解析
* ### load_data()
```python=
def load_data():
(x_train, y_train), (x_test, y_test) = mnist.load_data()
x_train = x_train.astype("float32") / 255
x_test = x_test.astype("float32") / 255
x_train = np.reshape(x_train, (len(x_train), 28, 28, 1))
x_test = np.reshape(x_test, (len(x_test), 28, 28, 1))
# 由於在autoencoder中是用x_train(label)對x_train_noisy(input)訓練
# 和一般fully connect的y_train(label)對x_train(input)訓練不同
# 因此此處就不特別處理y的資料了
return x_train, x_test
```
此處先將mnist資料集讀進來
主要有四大塊分別為
* **x_train**
shape= (60000, 28, 28)
* **y_train**
shape= (60000)
* **x_test**
shape= (10000, 28, 28)
* **y_test**
shape= (10000)
而由於這些手寫數字本身為28\*28的圖片
灰度值域分佈在(0, 255)之間,dtype="uint8"
因此先處理轉型以及正規化
```python=
x_train = x_train.astype("float32") / 255
x_test = x_test.astype("float32") / 255
```
---
接著因為tensor的基本維度是**batches** \* **data_dimension** \* **channels**
以mnist的資料型態為例
我們應該要把(60000, 28, 28)轉換成(60000, 28, 28, 1)
這邊就使用numpy.reshape完成維度的轉換
```python=
x_train = np.reshape(x_train, (len(x_train), 28, 28, 1))
x_test = np.reshape(x_test, (len(x_test), 28, 28, 1))
```
---
到這邊你可能會發現
明明資料有四大區塊,怎麼都只有對x進行預處理呢?
這是因為autuencoder和以fully connect結尾的網路不同
用帶有雜訊的圖片作為輸入
而用乾淨的原圖片作為label
以mnist資料集為例,我們會使用
x_train_noisy->input
x_train->label
因此就不需要y的資訊啦
最後只回傳**x_train**以及**x_label**就好囉
* ## noise()
這個區塊就比較沒什麼需要講解的
主要將x_train和x_test讀入之後加一些隨機雜訊給他們
再clip到(0, 1)之間
大概就醬
* ## train_model()
```python=
def train_model():
input_img = Input(shape=(28, 28, 1)) # N*28*28*1
x = Conv2D(16, (3, 3), activation='relu', padding='same')(input_img) # N*28*28*16
x = MaxPooling2D((2, 2), padding='same')(x) # N*14*14*16
x = Conv2D(8, (3, 3), activation='relu', padding='same')(x) # N*14*14*8
x = MaxPooling2D((2, 2), padding='same')(x) # N*7*7*8
x = Conv2D(8, (3, 3), activation='relu', padding='same')(x) # N*7*7*8
encoded = MaxPooling2D((2, 2), padding='same', name='encoder')(x) # N*4*4*8
x = Conv2D(8, (3, 3), activation='relu', padding='same')(encoded) # N*4*4*8
x = UpSampling2D((2, 2))(x) # N*8*8*8
x = Conv2D(8, (3, 3), activation='relu', padding='same')(x) # N*8*8*8
x = UpSampling2D((2, 2))(x) # N*16*16*8
x = Conv2D(16, (3, 3), activation='relu')(x) # N*14*14*16
x = UpSampling2D((2, 2))(x) # N*28*28*16
decoded = Conv2D(1, (3, 3), activation='sigmoid', padding='same')(x) # N*28*28*1
autoencoder = tf.keras.Model(input_img, decoded)
autoencoder.compile(optimizer="adam",
loss="binary_crossentropy",
metrics=["accuracy"])
autoencoder.fit(x_train_noisy, x_train,
epochs=50,
batch_size=128,
shuffle=True
)
autoencoder.save('autoencoder.h5')
```

AutoEncoder的圖形化架構如上圖
包含了執行Downsampling(Convolution)的Encoder
以及Upsampling(Deconvolution)的Decoder
因此我們勢必會在model中建出這兩大部份
---
首先使用
```python=
# Input
input_img = Input(shape=(28, 28, 1))
```
定義輸入資料不包含batch的維度
mnist為一28\*28的灰度圖,因此直接定義為(28, 28, 1)
在讀資料集的時候我們先已經預先處理好維度的轉換
---
```python=
# Encoder
x = Conv2D(16, (3, 3), activation='relu', padding='same')(input_img) # N*28*28*16
x = MaxPooling2D((2, 2), padding='same')(x) # N*14*14*16
x = Conv2D(8, (3, 3), activation='relu', padding='same')(x) # N*14*14*8
x = MaxPooling2D((2, 2), padding='same')(x) # N*7*7*8
x = Conv2D(8, (3, 3), activation='relu', padding='same')(x) # N*7*7*8
encoded = MaxPooling2D((2, 2), padding='same', name='encoder')(x) # N*4*4*8
```
這邊使用Conv2D以及MaxPooling2D將feature size逐步精簡
可以留意到Conv2D的padding都為same
而預設的stride則為(1, 1)
所以其實只有MaxPooling才會使feature size降低
而kernel maps的變化則為1 -> 16 -> 16 -> 8 -> 8 -> 8 -> 8
---
```python=
# Decoder
x = Conv2D(8, (3, 3), activation='relu', padding='same')(encoded) # N*4*4*8
x = UpSampling2D((2, 2))(x) # N*8*8*8
x = Conv2D(8, (3, 3), activation='relu', padding='same')(x) # N*8*8*8
x = UpSampling2D((2, 2))(x) # N*16*16*8
x = Conv2D(16, (3, 3), activation='relu')(x) # N*14*14*16
x = UpSampling2D((2, 2))(x) # N*28*28*16
decoded = Conv2D(1, (3, 3), activation='sigmoid', padding='same')(x) # N*28*28*1
```
Decoder則把MaxPooling替換成UpSampling
逐步還原feature size到原本的圖片大小(28, 28)
這裡值得注意的是第三次Conv2D
`x = Conv2D(16, (3, 3), activation='relu')(x)`
並沒有指定padding= "same"
亦即使用了預設的"valid"
也就代表這次的Conv2D會讓feature size隨著kernel size產生變化
從16\*16變成了14\*14 (kernel size= (3, 3))
當然這也是為了能還原成原本的圖片大小28\*28
如果從頭到尾都指定same padding的話
因為前面池化時有一次大小變化為7 -> 4
硬是還原回去會使得4 \* 2^4 = 32
這樣便會造成維度不匹配啦
詳細的維度變化可以參考程式碼右方的註解
最後一層的激勵函數使用sigmoid
是為了保持輸出非線性地落在(0, 1)
畢竟我們是要輸出圖片嘛~
用tanh輸出負數就要爆炸了
---
```python=
# compile
autoencoder = tf.keras.Model(input_img, decoded)
autoencoder.compile(optimizer="adam",
loss="binary_crossentropy",
metrics=["accuracy"])
autoencoder.fit(x_train_noisy, x_train,
epochs=50,
batch_size=128,
shuffle=True
)
autoencoder.save('autoencoder.h5')
```
定義`autuencoder`為一個新model
使用`input_img`為輸入,`decoded`為輸出
`fit`時input為`x_train_noisy`,label為`x_train`
和前文所提及的相同
最後就存一個權重出來囉~
(2020/04/15)