tags: `tensorflow` `autoencoder`

1. AutoEncoder訓練mnist

原程式碼






















































































import tensorflow as tf
import numpy as np
from matplotlib import pyplot as plt

from tensorflow.keras.layers import Input, Conv2D, MaxPooling2D, UpSampling2D
from tensorflow.keras.datasets import mnist

def load_data():
    (x_train, y_train), (x_test, y_test) = mnist.load_data()

    x_train = x_train.astype("float32") / 255
    x_test = x_test.astype("float32") / 255
    x_train = np.reshape(x_train, (len(x_train), 28, 28, 1))
    x_test = np.reshape(x_test, (len(x_test), 28, 28, 1))

    # 由於在autoencoder中是用x_train(label)對x_train_noisy(input)訓練
    # 和一般fully connect的y_train(label)對x_train(input)訓練不同
    # 因此此處就不特別處理y的資料了

    return x_train, x_test

def noise(x_train, x_test):
    noise_factor = 0.5
    x_train_noisy = x_train + noise_factor * np.random.normal(0.0, 1.0, size=x_train.shape)
    x_test_noisy = x_test + noise_factor * np.random.normal(0.0, 1.0, size=x_test.shape)

    x_train_noisy = np.clip(x_train_noisy, 0.0, 1.0)
    x_test_noisy = np.clip(x_test_noisy, 0.0, 1.0)

    return x_train_noisy, x_test_noisy

def train_model():
    input_img = Input(shape=(28, 28, 1))                                        # N*28*28*1

    x = Conv2D(16, (3, 3), activation='relu', padding='same')(input_img)        # N*28*28*16
    x = MaxPooling2D((2, 2), padding='same')(x)                                 # N*14*14*16
    x = Conv2D(8, (3, 3), activation='relu', padding='same')(x)                 # N*14*14*8
    x = MaxPooling2D((2, 2), padding='same')(x)                                 # N*7*7*8
    x = Conv2D(8, (3, 3), activation='relu', padding='same')(x)                 # N*7*7*8
    encoded = MaxPooling2D((2, 2), padding='same', name='encoder')(x)           # N*4*4*8

    x = Conv2D(8, (3, 3), activation='relu', padding='same')(encoded)           # N*4*4*8
    x = UpSampling2D((2, 2))(x)                                                 # N*8*8*8
    x = Conv2D(8, (3, 3), activation='relu', padding='same')(x)                 # N*8*8*8
    x = UpSampling2D((2, 2))(x)                                                 # N*16*16*8
    x = Conv2D(16, (3, 3), activation='relu')(x)                                # N*14*14*16
    x = UpSampling2D((2, 2))(x)                                                 # N*28*28*16
    decoded = Conv2D(1, (3, 3), activation='sigmoid', padding='same')(x)        # N*28*28*1

    autoencoder = tf.keras.Model(input_img, decoded)  
    
    autoencoder.compile(optimizer="adam",
                        loss="binary_crossentropy",
                        metrics=["accuracy"])

    autoencoder.fit(x_train_noisy, x_train,
                    epochs=50,
                    batch_size=128,
                    shuffle=True
                    )

    autoencoder.save('autoencoder.h5')

def plot():
    # 訓練完的資料皆為 60000*28*28*1
    # 因此要指定第一維的batch index並reshape成(28, 28)，消除channel的維度
    input_image = np.reshape(x_test_noisy[5], (28, 28))
    output_image = np.reshape(denoised_images[5], (28, 28))
    stack_image = np.hstack((input_image, output_image))

    plt.imshow(stack_image, cmap="gray")
    plt.show()

# ---------------------------------------------------------------------- #

TRAIN_ON = False

x_train, x_test = load_data()
x_train_noisy, x_test_noisy = noise(x_train, x_test)

if TRAIN_ON:
    train_model()
else:
    autoencoder = tf.keras.models.load_model("autoencoder.h5")
    denoised_images = autoencoder.predict(x_test_noisy)
    plot()

分段解析

load_data()













def load_data():
    (x_train, y_train), (x_test, y_test) = mnist.load_data()

    x_train = x_train.astype("float32") / 255
    x_test = x_test.astype("float32") / 255
    x_train = np.reshape(x_train, (len(x_train), 28, 28, 1))
    x_test = np.reshape(x_test, (len(x_test), 28, 28, 1))

    # 由於在autoencoder中是用x_train(label)對x_train_noisy(input)訓練
    # 和一般fully connect的y_train(label)對x_train(input)訓練不同
    # 因此此處就不特別處理y的資料了

    return x_train, x_test

此處先將mnist資料集讀進來
主要有四大塊分別為

x_train
shape= (60000, 28, 28)
y_train
shape= (60000)
x_test
shape= (10000, 28, 28)
y_test
shape= (10000)

而由於這些手寫數字本身為28*28的圖片
灰度值域分佈在(0, 255)之間，dtype="uint8"
因此先處理轉型以及正規化


x_train = x_train.astype("float32") / 255
x_test = x_test.astype("float32") / 255

接著因為tensor的基本維度是batches * data_dimension * channels
以mnist的資料型態為例
我們應該要把(60000, 28, 28)轉換成(60000, 28, 28, 1)
這邊就使用numpy.reshape完成維度的轉換


x_train = np.reshape(x_train, (len(x_train), 28, 28, 1))
x_test = np.reshape(x_test, (len(x_test), 28, 28, 1))

到這邊你可能會發現
明明資料有四大區塊，怎麼都只有對x進行預處理呢？
這是因為autuencoder和以fully connect結尾的網路不同
用帶有雜訊的圖片作為輸入
而用乾淨的原圖片作為label
以mnist資料集為例，我們會使用
x_train_noisy->input
x_train->label
因此就不需要y的資訊啦
最後只回傳x_train以及x_label就好囉

noise()

這個區塊就比較沒什麼需要講解的
主要將x_train和x_test讀入之後加一些隨機雜訊給他們
再clip到(0, 1)之間
大概就醬

train_model()































def train_model():
    input_img = Input(shape=(28, 28, 1))                                        # N*28*28*1

    x = Conv2D(16, (3, 3), activation='relu', padding='same')(input_img)        # N*28*28*16
    x = MaxPooling2D((2, 2), padding='same')(x)                                 # N*14*14*16
    x = Conv2D(8, (3, 3), activation='relu', padding='same')(x)                 # N*14*14*8
    x = MaxPooling2D((2, 2), padding='same')(x)                                 # N*7*7*8
    x = Conv2D(8, (3, 3), activation='relu', padding='same')(x)                 # N*7*7*8
    encoded = MaxPooling2D((2, 2), padding='same', name='encoder')(x)           # N*4*4*8

    x = Conv2D(8, (3, 3), activation='relu', padding='same')(encoded)           # N*4*4*8
    x = UpSampling2D((2, 2))(x)                                                 # N*8*8*8
    x = Conv2D(8, (3, 3), activation='relu', padding='same')(x)                 # N*8*8*8
    x = UpSampling2D((2, 2))(x)                                                 # N*16*16*8
    x = Conv2D(16, (3, 3), activation='relu')(x)                                # N*14*14*16
    x = UpSampling2D((2, 2))(x)                                                 # N*28*28*16
    decoded = Conv2D(1, (3, 3), activation='sigmoid', padding='same')(x)        # N*28*28*1

    autoencoder = tf.keras.Model(input_img, decoded)  
    
    autoencoder.compile(optimizer="adam",
                        loss="binary_crossentropy",
                        metrics=["accuracy"])

    autoencoder.fit(x_train_noisy, x_train,
                    epochs=50,
                    batch_size=128,
                    shuffle=True
                    )

    autoencoder.save('autoencoder.h5')

AutoEncoder的圖形化架構如上圖
包含了執行Downsampling(Convolution)的Encoder
以及Upsampling(Deconvolution)的Decoder
因此我們勢必會在model中建出這兩大部份

首先使用


# Input
input_img = Input(shape=(28, 28, 1))

定義輸入資料不包含batch的維度
mnist為一28*28的灰度圖，因此直接定義為(28, 28, 1)
在讀資料集的時候我們先已經預先處理好維度的轉換







# Encoder
x = Conv2D(16, (3, 3), activation='relu', padding='same')(input_img)        # N*28*28*16
x = MaxPooling2D((2, 2), padding='same')(x)                                 # N*14*14*16
x = Conv2D(8, (3, 3), activation='relu', padding='same')(x)                 # N*14*14*8
x = MaxPooling2D((2, 2), padding='same')(x)                                 # N*7*7*8
x = Conv2D(8, (3, 3), activation='relu', padding='same')(x)                 # N*7*7*8
encoded = MaxPooling2D((2, 2), padding='same', name='encoder')(x)           # N*4*4*8

這邊使用Conv2D以及MaxPooling2D將feature size逐步精簡
可以留意到Conv2D的padding都為same
而預設的stride則為(1, 1)
所以其實只有MaxPooling才會使feature size降低
而kernel maps的變化則為1 -> 16 -> 16 -> 8 -> 8 -> 8 -> 8








# Decoder
x = Conv2D(8, (3, 3), activation='relu', padding='same')(encoded)           # N*4*4*8
x = UpSampling2D((2, 2))(x)                                                 # N*8*8*8
x = Conv2D(8, (3, 3), activation='relu', padding='same')(x)                 # N*8*8*8
x = UpSampling2D((2, 2))(x)                                                 # N*16*16*8
x = Conv2D(16, (3, 3), activation='relu')(x)                                # N*14*14*16
x = UpSampling2D((2, 2))(x)                                                 # N*28*28*16
decoded = Conv2D(1, (3, 3), activation='sigmoid', padding='same')(x)        # N*28*28*1

Decoder則把MaxPooling替換成UpSampling
逐步還原feature size到原本的圖片大小(28, 28)
這裡值得注意的是第三次Conv2D
x = Conv2D(16, (3, 3), activation='relu')(x)
並沒有指定padding= "same"
亦即使用了預設的"valid"
也就代表這次的Conv2D會讓feature size隨著kernel size產生變化
從16*16變成了14*14 (kernel size= (3, 3))

當然這也是為了能還原成原本的圖片大小28*28
如果從頭到尾都指定same padding的話
因為前面池化時有一次大小變化為7 -> 4
硬是還原回去會使得4 * 2^4 = 32
這樣便會造成維度不匹配啦
詳細的維度變化可以參考程式碼右方的註解

最後一層的激勵函數使用sigmoid
是為了保持輸出非線性地落在(0, 1)
畢竟我們是要輸出圖片嘛~
用tanh輸出負數就要爆炸了














# compile
autoencoder = tf.keras.Model(input_img, decoded)  
    
autoencoder.compile(optimizer="adam",
                    loss="binary_crossentropy",
                    metrics=["accuracy"])

autoencoder.fit(x_train_noisy, x_train,
                epochs=50,
                batch_size=128,
                shuffle=True
                )

autoencoder.save('autoencoder.h5')

定義autuencoder為一個新model
使用input_img為輸入，decoded為輸出
fit時input為x_train_noisy，label為x_train
和前文所提及的相同
最後就存一個權重出來囉~

(2020/04/15)

tags: tensorflow autoencoder

1. AutoEncoder訓練mnist

原程式碼

分段解析

load_data()

noise()

train_model()

Read more

price = 500 if price > 300 :

0. Tensorflow 2.0 不負責任教學

tags: `tensorflow` `autoencoder`