###### tags: `tensorflow` `autoencoder` # 1. AutoEncoder訓練mnist ## 原程式碼 參考於[莉森揪的iT邦文章](https://ithelp.ithome.com.tw/articles/10207148)以及官方之[Keras官方教學](https://blog.keras.io/building-autoencoders-in-keras.html) ```python= import tensorflow as tf import numpy as np from matplotlib import pyplot as plt from tensorflow.keras.layers import Input, Conv2D, MaxPooling2D, UpSampling2D from tensorflow.keras.datasets import mnist def load_data(): (x_train, y_train), (x_test, y_test) = mnist.load_data() x_train = x_train.astype("float32") / 255 x_test = x_test.astype("float32") / 255 x_train = np.reshape(x_train, (len(x_train), 28, 28, 1)) x_test = np.reshape(x_test, (len(x_test), 28, 28, 1)) # 由於在autoencoder中是用x_train(label)對x_train_noisy(input)訓練 # 和一般fully connect的y_train(label)對x_train(input)訓練不同 # 因此此處就不特別處理y的資料了 return x_train, x_test def noise(x_train, x_test): noise_factor = 0.5 x_train_noisy = x_train + noise_factor * np.random.normal(0.0, 1.0, size=x_train.shape) x_test_noisy = x_test + noise_factor * np.random.normal(0.0, 1.0, size=x_test.shape) x_train_noisy = np.clip(x_train_noisy, 0.0, 1.0) x_test_noisy = np.clip(x_test_noisy, 0.0, 1.0) return x_train_noisy, x_test_noisy def train_model(): input_img = Input(shape=(28, 28, 1)) # N*28*28*1 x = Conv2D(16, (3, 3), activation='relu', padding='same')(input_img) # N*28*28*16 x = MaxPooling2D((2, 2), padding='same')(x) # N*14*14*16 x = Conv2D(8, (3, 3), activation='relu', padding='same')(x) # N*14*14*8 x = MaxPooling2D((2, 2), padding='same')(x) # N*7*7*8 x = Conv2D(8, (3, 3), activation='relu', padding='same')(x) # N*7*7*8 encoded = MaxPooling2D((2, 2), padding='same', name='encoder')(x) # N*4*4*8 x = Conv2D(8, (3, 3), activation='relu', padding='same')(encoded) # N*4*4*8 x = UpSampling2D((2, 2))(x) # N*8*8*8 x = Conv2D(8, (3, 3), activation='relu', padding='same')(x) # N*8*8*8 x = UpSampling2D((2, 2))(x) # N*16*16*8 x = Conv2D(16, (3, 3), activation='relu')(x) # N*14*14*16 x = UpSampling2D((2, 2))(x) # N*28*28*16 decoded = Conv2D(1, (3, 3), activation='sigmoid', padding='same')(x) # N*28*28*1 autoencoder = tf.keras.Model(input_img, decoded) autoencoder.compile(optimizer="adam", loss="binary_crossentropy", metrics=["accuracy"]) autoencoder.fit(x_train_noisy, x_train, epochs=50, batch_size=128, shuffle=True ) autoencoder.save('autoencoder.h5') def plot(): # 訓練完的資料皆為 60000*28*28*1 # 因此要指定第一維的batch index並reshape成(28, 28),消除channel的維度 input_image = np.reshape(x_test_noisy[5], (28, 28)) output_image = np.reshape(denoised_images[5], (28, 28)) stack_image = np.hstack((input_image, output_image)) plt.imshow(stack_image, cmap="gray") plt.show() # ---------------------------------------------------------------------- # TRAIN_ON = False x_train, x_test = load_data() x_train_noisy, x_test_noisy = noise(x_train, x_test) if TRAIN_ON: train_model() else: autoencoder = tf.keras.models.load_model("autoencoder.h5") denoised_images = autoencoder.predict(x_test_noisy) plot() ``` ## 分段解析 * ### load_data() ```python= def load_data(): (x_train, y_train), (x_test, y_test) = mnist.load_data() x_train = x_train.astype("float32") / 255 x_test = x_test.astype("float32") / 255 x_train = np.reshape(x_train, (len(x_train), 28, 28, 1)) x_test = np.reshape(x_test, (len(x_test), 28, 28, 1)) # 由於在autoencoder中是用x_train(label)對x_train_noisy(input)訓練 # 和一般fully connect的y_train(label)對x_train(input)訓練不同 # 因此此處就不特別處理y的資料了 return x_train, x_test ``` 此處先將mnist資料集讀進來 主要有四大塊分別為 * **x_train** shape= (60000, 28, 28) * **y_train** shape= (60000) * **x_test** shape= (10000, 28, 28) * **y_test** shape= (10000) 而由於這些手寫數字本身為28\*28的圖片 灰度值域分佈在(0, 255)之間,dtype="uint8" 因此先處理轉型以及正規化 ```python= x_train = x_train.astype("float32") / 255 x_test = x_test.astype("float32") / 255 ``` --- 接著因為tensor的基本維度是**batches** \* **data_dimension** \* **channels** 以mnist的資料型態為例 我們應該要把(60000, 28, 28)轉換成(60000, 28, 28, 1) 這邊就使用numpy.reshape完成維度的轉換 ```python= x_train = np.reshape(x_train, (len(x_train), 28, 28, 1)) x_test = np.reshape(x_test, (len(x_test), 28, 28, 1)) ``` --- 到這邊你可能會發現 明明資料有四大區塊,怎麼都只有對x進行預處理呢? 這是因為autuencoder和以fully connect結尾的網路不同 用帶有雜訊的圖片作為輸入 而用乾淨的原圖片作為label 以mnist資料集為例,我們會使用 x_train_noisy->input x_train->label 因此就不需要y的資訊啦 最後只回傳**x_train**以及**x_label**就好囉 * ## noise() 這個區塊就比較沒什麼需要講解的 主要將x_train和x_test讀入之後加一些隨機雜訊給他們 再clip到(0, 1)之間 大概就醬 * ## train_model() ```python= def train_model(): input_img = Input(shape=(28, 28, 1)) # N*28*28*1 x = Conv2D(16, (3, 3), activation='relu', padding='same')(input_img) # N*28*28*16 x = MaxPooling2D((2, 2), padding='same')(x) # N*14*14*16 x = Conv2D(8, (3, 3), activation='relu', padding='same')(x) # N*14*14*8 x = MaxPooling2D((2, 2), padding='same')(x) # N*7*7*8 x = Conv2D(8, (3, 3), activation='relu', padding='same')(x) # N*7*7*8 encoded = MaxPooling2D((2, 2), padding='same', name='encoder')(x) # N*4*4*8 x = Conv2D(8, (3, 3), activation='relu', padding='same')(encoded) # N*4*4*8 x = UpSampling2D((2, 2))(x) # N*8*8*8 x = Conv2D(8, (3, 3), activation='relu', padding='same')(x) # N*8*8*8 x = UpSampling2D((2, 2))(x) # N*16*16*8 x = Conv2D(16, (3, 3), activation='relu')(x) # N*14*14*16 x = UpSampling2D((2, 2))(x) # N*28*28*16 decoded = Conv2D(1, (3, 3), activation='sigmoid', padding='same')(x) # N*28*28*1 autoencoder = tf.keras.Model(input_img, decoded) autoencoder.compile(optimizer="adam", loss="binary_crossentropy", metrics=["accuracy"]) autoencoder.fit(x_train_noisy, x_train, epochs=50, batch_size=128, shuffle=True ) autoencoder.save('autoencoder.h5') ``` ![](https://i.imgur.com/sqdosED.png) AutoEncoder的圖形化架構如上圖 包含了執行Downsampling(Convolution)的Encoder 以及Upsampling(Deconvolution)的Decoder 因此我們勢必會在model中建出這兩大部份 --- 首先使用 ```python= # Input input_img = Input(shape=(28, 28, 1)) ``` 定義輸入資料不包含batch的維度 mnist為一28\*28的灰度圖,因此直接定義為(28, 28, 1) 在讀資料集的時候我們先已經預先處理好維度的轉換 --- ```python= # Encoder x = Conv2D(16, (3, 3), activation='relu', padding='same')(input_img) # N*28*28*16 x = MaxPooling2D((2, 2), padding='same')(x) # N*14*14*16 x = Conv2D(8, (3, 3), activation='relu', padding='same')(x) # N*14*14*8 x = MaxPooling2D((2, 2), padding='same')(x) # N*7*7*8 x = Conv2D(8, (3, 3), activation='relu', padding='same')(x) # N*7*7*8 encoded = MaxPooling2D((2, 2), padding='same', name='encoder')(x) # N*4*4*8 ``` 這邊使用Conv2D以及MaxPooling2D將feature size逐步精簡 可以留意到Conv2D的padding都為same 而預設的stride則為(1, 1) 所以其實只有MaxPooling才會使feature size降低 而kernel maps的變化則為1 -> 16 -> 16 -> 8 -> 8 -> 8 -> 8 --- ```python= # Decoder x = Conv2D(8, (3, 3), activation='relu', padding='same')(encoded) # N*4*4*8 x = UpSampling2D((2, 2))(x) # N*8*8*8 x = Conv2D(8, (3, 3), activation='relu', padding='same')(x) # N*8*8*8 x = UpSampling2D((2, 2))(x) # N*16*16*8 x = Conv2D(16, (3, 3), activation='relu')(x) # N*14*14*16 x = UpSampling2D((2, 2))(x) # N*28*28*16 decoded = Conv2D(1, (3, 3), activation='sigmoid', padding='same')(x) # N*28*28*1 ``` Decoder則把MaxPooling替換成UpSampling 逐步還原feature size到原本的圖片大小(28, 28) 這裡值得注意的是第三次Conv2D `x = Conv2D(16, (3, 3), activation='relu')(x)` 並沒有指定padding= "same" 亦即使用了預設的"valid" 也就代表這次的Conv2D會讓feature size隨著kernel size產生變化 從16\*16變成了14\*14 (kernel size= (3, 3)) 當然這也是為了能還原成原本的圖片大小28\*28 如果從頭到尾都指定same padding的話 因為前面池化時有一次大小變化為7 -> 4 硬是還原回去會使得4 \* 2^4 = 32 這樣便會造成維度不匹配啦 詳細的維度變化可以參考程式碼右方的註解 最後一層的激勵函數使用sigmoid 是為了保持輸出非線性地落在(0, 1) 畢竟我們是要輸出圖片嘛~ 用tanh輸出負數就要爆炸了 --- ```python= # compile autoencoder = tf.keras.Model(input_img, decoded) autoencoder.compile(optimizer="adam", loss="binary_crossentropy", metrics=["accuracy"]) autoencoder.fit(x_train_noisy, x_train, epochs=50, batch_size=128, shuffle=True ) autoencoder.save('autoencoder.h5') ``` 定義`autuencoder`為一個新model 使用`input_img`為輸入,`decoded`為輸出 `fit`時input為`x_train_noisy`,label為`x_train` 和前文所提及的相同 最後就存一個權重出來囉~ (2020/04/15)