Try   HackMD
tags: tensorflow autoencoder

1. AutoEncoder訓練mnist

原程式碼

參考於莉森揪的iT邦文章以及官方之Keras官方教學

import tensorflow as tf import numpy as np from matplotlib import pyplot as plt from tensorflow.keras.layers import Input, Conv2D, MaxPooling2D, UpSampling2D from tensorflow.keras.datasets import mnist def load_data(): (x_train, y_train), (x_test, y_test) = mnist.load_data() x_train = x_train.astype("float32") / 255 x_test = x_test.astype("float32") / 255 x_train = np.reshape(x_train, (len(x_train), 28, 28, 1)) x_test = np.reshape(x_test, (len(x_test), 28, 28, 1)) # 由於在autoencoder中是用x_train(label)對x_train_noisy(input)訓練 # 和一般fully connect的y_train(label)對x_train(input)訓練不同 # 因此此處就不特別處理y的資料了 return x_train, x_test def noise(x_train, x_test): noise_factor = 0.5 x_train_noisy = x_train + noise_factor * np.random.normal(0.0, 1.0, size=x_train.shape) x_test_noisy = x_test + noise_factor * np.random.normal(0.0, 1.0, size=x_test.shape) x_train_noisy = np.clip(x_train_noisy, 0.0, 1.0) x_test_noisy = np.clip(x_test_noisy, 0.0, 1.0) return x_train_noisy, x_test_noisy def train_model(): input_img = Input(shape=(28, 28, 1)) # N*28*28*1 x = Conv2D(16, (3, 3), activation='relu', padding='same')(input_img) # N*28*28*16 x = MaxPooling2D((2, 2), padding='same')(x) # N*14*14*16 x = Conv2D(8, (3, 3), activation='relu', padding='same')(x) # N*14*14*8 x = MaxPooling2D((2, 2), padding='same')(x) # N*7*7*8 x = Conv2D(8, (3, 3), activation='relu', padding='same')(x) # N*7*7*8 encoded = MaxPooling2D((2, 2), padding='same', name='encoder')(x) # N*4*4*8 x = Conv2D(8, (3, 3), activation='relu', padding='same')(encoded) # N*4*4*8 x = UpSampling2D((2, 2))(x) # N*8*8*8 x = Conv2D(8, (3, 3), activation='relu', padding='same')(x) # N*8*8*8 x = UpSampling2D((2, 2))(x) # N*16*16*8 x = Conv2D(16, (3, 3), activation='relu')(x) # N*14*14*16 x = UpSampling2D((2, 2))(x) # N*28*28*16 decoded = Conv2D(1, (3, 3), activation='sigmoid', padding='same')(x) # N*28*28*1 autoencoder = tf.keras.Model(input_img, decoded) autoencoder.compile(optimizer="adam", loss="binary_crossentropy", metrics=["accuracy"]) autoencoder.fit(x_train_noisy, x_train, epochs=50, batch_size=128, shuffle=True ) autoencoder.save('autoencoder.h5') def plot(): # 訓練完的資料皆為 60000*28*28*1 # 因此要指定第一維的batch index並reshape成(28, 28),消除channel的維度 input_image = np.reshape(x_test_noisy[5], (28, 28)) output_image = np.reshape(denoised_images[5], (28, 28)) stack_image = np.hstack((input_image, output_image)) plt.imshow(stack_image, cmap="gray") plt.show() # ---------------------------------------------------------------------- # TRAIN_ON = False x_train, x_test = load_data() x_train_noisy, x_test_noisy = noise(x_train, x_test) if TRAIN_ON: train_model() else: autoencoder = tf.keras.models.load_model("autoencoder.h5") denoised_images = autoencoder.predict(x_test_noisy) plot()

分段解析

  • load_data()

def load_data(): (x_train, y_train), (x_test, y_test) = mnist.load_data() x_train = x_train.astype("float32") / 255 x_test = x_test.astype("float32") / 255 x_train = np.reshape(x_train, (len(x_train), 28, 28, 1)) x_test = np.reshape(x_test, (len(x_test), 28, 28, 1)) # 由於在autoencoder中是用x_train(label)對x_train_noisy(input)訓練 # 和一般fully connect的y_train(label)對x_train(input)訓練不同 # 因此此處就不特別處理y的資料了 return x_train, x_test

此處先將mnist資料集讀進來
主要有四大塊分別為

  • x_train
    shape= (60000, 28, 28)
  • y_train
    shape= (60000)
  • x_test
    shape= (10000, 28, 28)
  • y_test
    shape= (10000)

而由於這些手寫數字本身為28*28的圖片
灰度值域分佈在(0, 255)之間,dtype="uint8"
因此先處理轉型以及正規化

x_train = x_train.astype("float32") / 255 x_test = x_test.astype("float32") / 255

接著因為tensor的基本維度是batches * data_dimension * channels
以mnist的資料型態為例
我們應該要把(60000, 28, 28)轉換成(60000, 28, 28, 1)
這邊就使用numpy.reshape完成維度的轉換

x_train = np.reshape(x_train, (len(x_train), 28, 28, 1)) x_test = np.reshape(x_test, (len(x_test), 28, 28, 1))

到這邊你可能會發現
明明資料有四大區塊,怎麼都只有對x進行預處理呢?
這是因為autuencoder和以fully connect結尾的網路不同
用帶有雜訊的圖片作為輸入
而用乾淨的原圖片作為label
以mnist資料集為例,我們會使用
x_train_noisy->input
x_train->label
因此就不需要y的資訊啦
最後只回傳x_train以及x_label就好囉

  • noise()

這個區塊就比較沒什麼需要講解的
主要將x_train和x_test讀入之後加一些隨機雜訊給他們
再clip到(0, 1)之間
大概就醬

  • train_model()

def train_model(): input_img = Input(shape=(28, 28, 1)) # N*28*28*1 x = Conv2D(16, (3, 3), activation='relu', padding='same')(input_img) # N*28*28*16 x = MaxPooling2D((2, 2), padding='same')(x) # N*14*14*16 x = Conv2D(8, (3, 3), activation='relu', padding='same')(x) # N*14*14*8 x = MaxPooling2D((2, 2), padding='same')(x) # N*7*7*8 x = Conv2D(8, (3, 3), activation='relu', padding='same')(x) # N*7*7*8 encoded = MaxPooling2D((2, 2), padding='same', name='encoder')(x) # N*4*4*8 x = Conv2D(8, (3, 3), activation='relu', padding='same')(encoded) # N*4*4*8 x = UpSampling2D((2, 2))(x) # N*8*8*8 x = Conv2D(8, (3, 3), activation='relu', padding='same')(x) # N*8*8*8 x = UpSampling2D((2, 2))(x) # N*16*16*8 x = Conv2D(16, (3, 3), activation='relu')(x) # N*14*14*16 x = UpSampling2D((2, 2))(x) # N*28*28*16 decoded = Conv2D(1, (3, 3), activation='sigmoid', padding='same')(x) # N*28*28*1 autoencoder = tf.keras.Model(input_img, decoded) autoencoder.compile(optimizer="adam", loss="binary_crossentropy", metrics=["accuracy"]) autoencoder.fit(x_train_noisy, x_train, epochs=50, batch_size=128, shuffle=True ) autoencoder.save('autoencoder.h5')

AutoEncoder的圖形化架構如上圖
包含了執行Downsampling(Convolution)的Encoder
以及Upsampling(Deconvolution)的Decoder
因此我們勢必會在model中建出這兩大部份


首先使用

# Input input_img = Input(shape=(28, 28, 1))

定義輸入資料不包含batch的維度
mnist為一28*28的灰度圖,因此直接定義為(28, 28, 1)
在讀資料集的時候我們先已經預先處理好維度的轉換


# Encoder x = Conv2D(16, (3, 3), activation='relu', padding='same')(input_img) # N*28*28*16 x = MaxPooling2D((2, 2), padding='same')(x) # N*14*14*16 x = Conv2D(8, (3, 3), activation='relu', padding='same')(x) # N*14*14*8 x = MaxPooling2D((2, 2), padding='same')(x) # N*7*7*8 x = Conv2D(8, (3, 3), activation='relu', padding='same')(x) # N*7*7*8 encoded = MaxPooling2D((2, 2), padding='same', name='encoder')(x) # N*4*4*8

這邊使用Conv2D以及MaxPooling2D將feature size逐步精簡
可以留意到Conv2D的padding都為same
而預設的stride則為(1, 1)
所以其實只有MaxPooling才會使feature size降低
而kernel maps的變化則為1 -> 16 -> 16 -> 8 -> 8 -> 8 -> 8


# Decoder x = Conv2D(8, (3, 3), activation='relu', padding='same')(encoded) # N*4*4*8 x = UpSampling2D((2, 2))(x) # N*8*8*8 x = Conv2D(8, (3, 3), activation='relu', padding='same')(x) # N*8*8*8 x = UpSampling2D((2, 2))(x) # N*16*16*8 x = Conv2D(16, (3, 3), activation='relu')(x) # N*14*14*16 x = UpSampling2D((2, 2))(x) # N*28*28*16 decoded = Conv2D(1, (3, 3), activation='sigmoid', padding='same')(x) # N*28*28*1

Decoder則把MaxPooling替換成UpSampling
逐步還原feature size到原本的圖片大小(28, 28)
這裡值得注意的是第三次Conv2D
x = Conv2D(16, (3, 3), activation='relu')(x)
並沒有指定padding= "same"
亦即使用了預設的"valid"
也就代表這次的Conv2D會讓feature size隨著kernel size產生變化
從16*16變成了14*14 (kernel size= (3, 3))

當然這也是為了能還原成原本的圖片大小28*28
如果從頭到尾都指定same padding的話
因為前面池化時有一次大小變化為7 -> 4
硬是還原回去會使得4 * 2^4 = 32
這樣便會造成維度不匹配啦
詳細的維度變化可以參考程式碼右方的註解

最後一層的激勵函數使用sigmoid
是為了保持輸出非線性地落在(0, 1)
畢竟我們是要輸出圖片嘛~
用tanh輸出負數就要爆炸了


# compile autoencoder = tf.keras.Model(input_img, decoded) autoencoder.compile(optimizer="adam", loss="binary_crossentropy", metrics=["accuracy"]) autoencoder.fit(x_train_noisy, x_train, epochs=50, batch_size=128, shuffle=True ) autoencoder.save('autoencoder.h5')

定義autuencoder為一個新model
使用input_img為輸入,decoded為輸出
fit時input為x_train_noisy,label為x_train
和前文所提及的相同
最後就存一個權重出來囉~

(2020/04/15)