# Projects : 機器學習之基礎實驗與實作討論 ## Outline 本篇將討論下面幾點問題,並透過實作來進行論證: 1. 設計一個實驗,比較**參數量相似**的情況下,層數分別為Deep和Shallow的兩個網路架構,觀察Train、Test的Performance。 2. 使用Testing Data通過模型,將預測結果記錄成**Confusion Matrix**,並針對結果進行討論(E.g.分析哪些數字分類容易搞混,為什麼?) 3. 從MNIST資料集選其中一個數字,訓練一個Auto Encoder。在Embedding Layer的二維平面上等間距取10*10個Embedding,將這些Embedding通過Decoder後的數字圖,**按照在Embedding Layer的分布畫圖**,並解釋兩個Dimension可能分別在Model什麼特徵? ## Q1 > 設計一個實驗,比較**參數量相似**的情況下,層數分別為Deep和Shallow的兩個網路架構,觀察Train、Test的Performance。 在這裡將透過實作,討論Deep以及Shallow。 評估方法與判斷方式: 1. *AUC* AUC(Area Under Curve)代表在ROC曲線下的面積,能表示分類器預測能力的一項常用的統計值。前面提到,ROC曲線越靠近右上方越好,因此,ROC曲線下的面積越大越好,代表模型的效益越高。 * 當AUC = 1時,代表分類器非常完美,但這畢竟是理想狀況。 * 當AUC > 0.5時,代表分類器分類效果優於隨機猜測,模型有預測價值。 * 當AUC = 0.5時,代表分類器分類效果與隨機猜測相同,模型無預測價值。 * 當AUC < 0.5時,代表分類器分類效果比隨機猜測差,但如果進行反預測,就會優於隨機猜測。 2. *Accuracy & Precision & Recall* ![](https://i.imgur.com/qYg536s.png) Accuracy(準確率): 在所有情況中,正確判斷真假的比例。$$Accuracy=\frac{tp+tn}{tp+tn+fp+fn}$$ Precision(精確率) : 判斷為真的情況下,有多少是真的真。$$Precision=\frac{tp}{tp+fp}$$ Recall(recall) : 為真的情況下,有多少被正確判斷出來。$$Recall=\frac{tp}{tp+fn}$$ 3. *Loss* 交叉熵 Cross-Entropy,可以將交叉熵當作資訊的亂度,也就是我們一般所說的 loss 值。這個值降的越低越好。$$H(p,q)=-\sum_x p(x)log q(x)$$ ### Shallow : 1 Hidden layer, with 515170 parameters ![](https://i.imgur.com/lfHmBb3.png) * **Test_data** ![](https://i.imgur.com/L8I5BFu.png) * **Train_data** ![](https://i.imgur.com/VJCx9CJ.png) ### Deep : 4 Hidden Layers, with 515610 Parameters ![](https://i.imgur.com/UcTCZlt.png) * **Test_data** ![](https://i.imgur.com/fM7gDXV.png) * **Train_data** ![](https://i.imgur.com/lj5Luug.png) ### Conclusion 照理論來說越多層預測的效果應該會越好,但從上面兩個DNN模型設計中卻不分軒輊,可能為資料 量的不足。從結果上來看,Shallow比Deep的結果還要來的好一點點(AUC大一點點),可以說是差不多。 ### Code ```python= import tensorflow as tf from tensorflow import keras from keras.datasets import mnist import numpy as np import matplotlib import matplotlib.pyplot as plt # load data (X_train, Y_train), (X_test, Y_test) = mnist.load_data() print(type(X_train)) print(X_train.shape, Y_train.shape, X_test.shape, Y_test.shape) #70000切成60000 10000 print(np.unique(Y_train, return_counts=True)) #有幾個幾 print(np.unique(Y_test, return_counts=True)) figure = plt.figure(figsize=(8,8)) cols, rows = 3,3 for i in range(1,cols*rows+1): img, label = X_train[i-1], Y_train[i-1] #X_train是圖像 Y_train是解答 figure.add_subplot(rows,cols,i) plt.title("label= {}".format(label)) plt.axis("off") plt.imshow(img.squeeze(),cmap="gray") plt.show() # data preprocess #轉成可運算的 X_train = X_train.astype('float32') / 255 X_test = X_test.astype('float32') / 255 X_train = X_train.reshape(-1,28*28) #60000*784 X_test = X_test.reshape(-1,28*28) Y_train = keras.utils.to_categorical(Y_train, num_classes = 10, dtype='float32') Y_test = keras.utils.to_categorical(Y_test, num_classes = 10, dtype='float32') # hyperparameter lr = 0.001 #LEARNING RATE 越高代表新資料越重要,表示喜新厭舊(對於資料) num_classes = 10 batch_size = 128 #過128筆資料之後一起LEARNING(更新)(每一批有128筆資料) 60000筆資料跑完算一次 num_epochs = 5 #60000筆資料全部共看5次 metrics = [keras.metrics.AUC(from_logits=False, name='AUC'), #留什麼觀察值 keras.metrics.Precision(name='Precision'), keras.metrics.Recall(name='Recall'), keras.metrics.CategoricalAccuracy(name='Cat_acc')] optimizer = keras.optimizers.Adam(learning_rate=lr) loss = keras.losses.CategoricalCrossentropy(from_logits=False) def DNN1(input): input_layer = keras.Input(shape=(input.shape[1],),name='Input') #shape這邊指的是784,表示接收的第一層為784 h = keras.layers.Dense(units=648,activation='relu',name='Hidden_1')(input_layer) #784個權重再給256,超級大的非線性方程式 output_layer = keras.layers.Dense(units=num_classes,activation='softmax',name='Outupt')(h) model = keras.models.Model(inputs=input_layer,outputs=output_layer, name='DNN1') model.compile(loss=loss,optimizer=optimizer,metrics=metrics) return model model1 = DNN1(X_train) model1.summary() #(784*(256+1))+(256*(64+1))+(65+(10+1))=218058 history1 = model1.fit(X_train,Y_train,validation_split=0.2,epochs=num_epochs,batch_size=batch_size,shuffle=True) #validation_split=0.2 表示80%拿去建模 result1 = model1.evaluate(X_test,Y_test,batch_size=batch_size,return_dict=True) #shuffle代表打亂 result12 = model1.evaluate(X_train,Y_train,batch_size=batch_size,return_dict=True) print(result1) def DNN2(input): input_layer = keras.Input(shape=(input.shape[1],),name='Input') #shape這邊指的是784,表示接收的第一層為784 h = keras.layers.Dense(units=400,activation='relu',name='Hidden_1')(input_layer) #784個權重再給256,超級大的非線性方程式 h = keras.layers.Dense(units=300,activation='relu',name='Hidden_2')(h) #後面那個()放的是input的層 784->256->64->10 h = keras.layers.Dense(units=200,activation='relu',name='Hidden_3')(h) #後面那個()放的是input的層 784->256->64->10 h = keras.layers.Dense(units=100,activation='relu',name='Hidden_4')(h) #後面那個()放的是input的層 784->256->64->10 output_layer = keras.layers.Dense(units=num_classes,activation='softmax',name='Outupt')(h) model = keras.models.Model(inputs=input_layer,outputs=output_layer, name='DNN2') model.compile(loss=loss,optimizer=optimizer,metrics=metrics) return model model2 = DNN2(X_train) model2.summary() #(784*(256+1))+(256*(64+1))+(65+(10+1))=218058 history2 = model2.fit(X_train,Y_train,validation_split=0.2,epochs=num_epochs,batch_size=batch_size,shuffle=True) #validation_split=0.2 表示80%拿去建模 result2 = model2.evaluate(X_test,Y_test,batch_size=batch_size,return_dict=True) #shuffle代表打亂 result22 = model1.evaluate(X_train,Y_train,batch_size=batch_size,return_dict=True) print(result2) ``` ## Q2 > 使用Testing Data通過模型,將預測結果記錄成**Confusion Matrix**,並針對結果進行討論(E.g.分析哪些數字分類容易搞混,為什麼?) 可從此矩陣中看出,4跟9是最容易被搞混的 ![](https://i.imgur.com/N4H6FUZ.png) 圖中的9的中間那塊,與4中間的那個三角形,如果將4的尖端連起來,再加上書寫的方式使的4以及9相當的像。 ![](https://i.imgur.com/wHF6GpH.png) ## Q3 > 從MNIST資料集選其中一個數字,訓練一個Auto Encoder。在Embedding Layer的二維平面上等間距取10*10個Embedding,將這些Embedding通過Decoder後的數字圖,**按照在Embedding Layer的分布畫圖**,並解釋兩個Dimension可能分別在Model什麼特徵? ![](https://i.imgur.com/yUgkfTG.png) * X軸 : 從右到左,可以看出慢慢變胖, 因此我們可以推斷它著重的特徵為左 右的寬度。 * Y軸 : 從上到下,可以看出中間的那塊 面積越來越多越來越明顯,我們可以 藉此推斷它著重的特徵或許為8的圓 圈的大小。 ### Q2&Q3 Code ```python= import tensorflow as tf from tensorflow import keras from keras.datasets import mnist import numpy as np import matplotlib import matplotlib.pyplot as plt from sklearn.metrics import confusion_matrix from keras.models import Model # load data (X_train, Y_train), (X_test, Y_test) = mnist.load_data() print(type(X_train)) print(X_train.shape, Y_train.shape, X_test.shape, Y_test.shape) #train建模 test測試 train本次 test作業 70000切成60000 10000 print(np.unique(Y_train, return_counts=True)) #有幾個幾 print(np.unique(Y_test, return_counts=True)) figure = plt.figure(figsize=(8,8)) cols, rows = 3,3 for i in range(1,cols*rows+1): img, label = X_train[i-1], Y_train[i-1] #X_train是圖像 Y_train是解答 figure.add_subplot(rows,cols,i) plt.title("label= {}".format(label)) plt.axis("off") plt.imshow(img.squeeze(),cmap="gray") plt.show() # data preprocess #轉成可運算的 X_train = X_train.astype('float32') / 255 X_test = X_test.astype('float32') / 255 X_train = X_train.reshape(-1,28*28) #60000*784 X_test = X_test.reshape(-1,28*28) Y_train = keras.utils.to_categorical(Y_train, num_classes = 10, dtype='float32') Y_test = keras.utils.to_categorical(Y_test, num_classes = 10, dtype='float32') # hyperparameter lr = 0.001 #LEARNING RATE 越高代表新資料越重要,表示喜新厭舊(對於資料) num_classes = 10 # batch_size = 128 #過128筆資料之後一起LEARNING(更新)(每一批有128筆資料) 60000筆資料跑完算一次 num_epochs = 5 #60000筆資料全部共看5次 metrics = [keras.metrics.AUC(from_logits=False, name='AUC'), #留什麼觀察值 keras.metrics.Precision(name='Precision'), keras.metrics.Recall(name='Recall'), keras.metrics.CategoricalAccuracy(name='Cat_acc')] optimizer = keras.optimizers.Adam(learning_rate=lr) loss = keras.losses.CategoricalCrossentropy(from_logits=False) def DNN(input): input_layer = keras.Input(shape=(input.shape[1],),name='Input') #shape這邊指的是784,表示接收的第一層為784 h = keras.layers.Dense(units=256,activation='relu',name='Hidden_1')(input_layer) #784個權重再給256,超級大的非線性方程式 h = keras.layers.Dense(units=64,activation='relu',name='Hidden_2')(h) #後面那個()放的是input的層 784->256->64->10 output_layer = keras.layers.Dense(units=num_classes,activation='softmax',name='Outupt')(h) model = keras.models.Model(inputs=input_layer,outputs=output_layer, name='DNN') model.compile(loss=loss,optimizer=optimizer,metrics=metrics) return model model = DNN(X_train) model.summary() history = model.fit(X_train,Y_train,validation_split=0.2,epochs=num_epochs,batch_size=batch_size,shuffle=True) #validation_split=0.2 表示80%拿去建模 result = model.evaluate(X_test,Y_test,batch_size=batch_size,return_dict=True) #shuffle代表打亂 print(result) #q2 Y_predict = model.predict(X_test, batch_size=batch_size, verbose=0, steps=None) Y_pred = np.argmax(Y_predict,-1) Y_true = np.argmax(Y_test,-1) cm = confusion_matrix(Y_true, Y_pred) array = np.zeros((len(Y_true), len(Y_true))) diff = Y_pred-Y_true for i in range(len(diff)): if diff[i] != 0: array[Y_pred[i]][Y_true[i]] += 1; max_diff = np.where(array == np.max(array)) print(np.where(array == np.max(array))) record = [] for i in range(len(diff)): if Y_pred[i] == max_diff[0] and Y_true[i] == max_diff[1] and diff[i] != 0: record.append(i) (X_train, Y_train), (X_test, Y_test) = mnist.load_data() figure = plt.figure(figsize=(8,8)) cols, rows = 3,3 for i in range(1, cols*rows+1): img, label = X_test[record[i-1]], Y_test[record[i-1]] figure.add_subplot(rows,cols, i) plt.title("label= {}".format(label)) plt.axis("off") plt.imshow(img.squeeze(), cmap="gray") plt.show() # AE # hyperparameter lr = 0.001 #LEARNING RATE 越高代表新資料越重要,表示喜新厭舊(對於資料) batch_size = 128 #過128筆資料之後一起LEARNING(更新)(每一批有128筆資料) 60000筆資料跑完算一次 num_epochs = 5 #60000筆資料全部共看5次 metrics = [] code_dim = 2 optimizer = keras.optimizers.Adam(learning_rate=lr) loss = keras.losses.BinaryCrossentropy(from_logits=False) (X_train, Y_train), (X_test, Y_test) = mnist.load_data() # chose 6 for i in range(len(Y_test)): if Y_test[i]==8: fir=i break # data preprocess X_train = X_train.astype('float32') / 255 X_test = X_test.astype('float32') / 255 X_train = X_train.reshape(-1,28*28) X_test = X_test.reshape(-1,28*28) Y_train = keras.utils.to_categorical(Y_train, num_classes = 10, dtype='float32') Y_test = keras.utils.to_categorical(Y_test, num_classes = 10, dtype='float32') def auto_encoder(input): input_layer = keras.Input(shape=(input.shape[1],),name='Input') # Encoder e_h = keras.layers.Dense(units=256,activation='relu',name='Encoder_1')(input_layer) e_h = keras.layers.Dense(units=64,activation='relu',name='Encoder_2')(e_h) # code code = keras.layers.Dense(units=code_dim, activation='relu',name='Code')(e_h) # Decoder d_h = keras.layers.Dense(units=64,activation='relu',name='Decoder_1')(code) d_h = keras.layers.Dense(units=256, activation='relu',name='Decoder_2')(d_h) output_layer = keras.layers.Dense(units=input.shape[1], activation='sigmoid',name='Output')(d_h) encoder = keras.models.Model(inputs=input_layer, outputs=code, name='Encoder') decoder = keras.models.Model(inputs=code, outputs=output_layer, name='Decoder') ae = keras.models.Model(inputs=input_layer,outputs=output_layer, name='AE') ae.compile(loss=loss, optimizer=optimizer) return ae, encoder, decoder model_ae, model_encoder, model_decoder = auto_encoder(X_train) model_ae.summary() model_encoder.summary() model_decoder.summary() model_ae.fit(X_train, X_train, epochs=num_epochs, batch_size=batch_size, shuffle=True) # check model weight print(model_ae.get_layer('Code').get_weights()[0].shape) print(model_ae.get_layer('Code').get_weights()[0][:1]) print(model_encoder.get_layer('Code').get_weights()[0].shape) print(model_encoder.get_layer('Code').get_weights()[0][:1]) # Encode img_pred = model_encoder.predict(X_test) print(img_pred.shape) print(img_pred[:3]) # Decode img_reconstruct = model_decoder.predict(img_pred) print(img_reconstruct.shape) figure = plt.figure(figsize=(20,4)) display_num = 10 cols, rows = display_num, 2 for i in range(1, cols*rows+1): figure.add_subplot(rows, cols, i) plt.axis('off') if i < display_num+1: plt.title(Y_test[i-1].argmax()) plt.imshow(X_test[i-1].reshape(28, 28), cmap='gray') else: plt.imshow(img_reconstruct[i-1-display_num].reshape(28,28), cmap='gray') plt.show() #q3 numbers = np.zeros((100,2)) xx = img_pred[fir][0] yy = img_pred[fir][1] zz = 3 for i in range(0,100,10): for j in range(10): numbers[i+j][0]=xx numbers[i+j][1]=yy+zz*j if(j==9): xx=xx+zz numbers_pre = model_decoder.predict(numbers) figure = plt.figure(figsize=(16,16)) cols, rows = 10,10 for i in range(1,cols*rows+1): figure.add_subplot(rows,cols,i) plt.title("num = 8") plt.axis("off") plt.imshow(numbers_pre[i-1].reshape(28,28), cmap='gray') plt.show() ``` ## Reference * [心理學和機器學習中的 Accuracy、Precision、Recall Rate 和 Confusion Matrix](https://chingtien.medium.com/%E5%BF%83%E7%90%86%E5%AD%B8%E5%92%8C%E6%A9%9F%E5%99%A8%E5%AD%B8%E7%BF%92%E4%B8%AD%E7%9A%84-accuracy-precision-recall-rate-%E5%92%8C-confusion-matrix-529d18abc3a) * [Keras.metrics中的accuracy总结](https://zhuanlan.zhihu.com/p/95293440)