# Projects : 機器學習之基礎實驗與實作討論
## Outline
本篇將討論下面幾點問題,並透過實作來進行論證:
1. 設計一個實驗,比較**參數量相似**的情況下,層數分別為Deep和Shallow的兩個網路架構,觀察Train、Test的Performance。
2. 使用Testing Data通過模型,將預測結果記錄成**Confusion Matrix**,並針對結果進行討論(E.g.分析哪些數字分類容易搞混,為什麼?)
3. 從MNIST資料集選其中一個數字,訓練一個Auto Encoder。在Embedding Layer的二維平面上等間距取10*10個Embedding,將這些Embedding通過Decoder後的數字圖,**按照在Embedding Layer的分布畫圖**,並解釋兩個Dimension可能分別在Model什麼特徵?
## Q1
> 設計一個實驗,比較**參數量相似**的情況下,層數分別為Deep和Shallow的兩個網路架構,觀察Train、Test的Performance。
在這裡將透過實作,討論Deep以及Shallow。
評估方法與判斷方式:
1. *AUC*
AUC(Area Under Curve)代表在ROC曲線下的面積,能表示分類器預測能力的一項常用的統計值。前面提到,ROC曲線越靠近右上方越好,因此,ROC曲線下的面積越大越好,代表模型的效益越高。
* 當AUC = 1時,代表分類器非常完美,但這畢竟是理想狀況。
* 當AUC > 0.5時,代表分類器分類效果優於隨機猜測,模型有預測價值。
* 當AUC = 0.5時,代表分類器分類效果與隨機猜測相同,模型無預測價值。
* 當AUC < 0.5時,代表分類器分類效果比隨機猜測差,但如果進行反預測,就會優於隨機猜測。
2. *Accuracy & Precision & Recall*

Accuracy(準確率): 在所有情況中,正確判斷真假的比例。$$Accuracy=\frac{tp+tn}{tp+tn+fp+fn}$$
Precision(精確率) : 判斷為真的情況下,有多少是真的真。$$Precision=\frac{tp}{tp+fp}$$
Recall(recall) : 為真的情況下,有多少被正確判斷出來。$$Recall=\frac{tp}{tp+fn}$$
3. *Loss*
交叉熵 Cross-Entropy,可以將交叉熵當作資訊的亂度,也就是我們一般所說的 loss 值。這個值降的越低越好。$$H(p,q)=-\sum_x p(x)log q(x)$$
### Shallow : 1 Hidden layer, with 515170 parameters

* **Test_data**

* **Train_data**

### Deep : 4 Hidden Layers, with 515610 Parameters

* **Test_data**

* **Train_data**

### Conclusion
照理論來說越多層預測的效果應該會越好,但從上面兩個DNN模型設計中卻不分軒輊,可能為資料 量的不足。從結果上來看,Shallow比Deep的結果還要來的好一點點(AUC大一點點),可以說是差不多。
### Code
```python=
import tensorflow as tf
from tensorflow import keras
from keras.datasets import mnist
import numpy as np
import matplotlib
import matplotlib.pyplot as plt
# load data
(X_train, Y_train), (X_test, Y_test) = mnist.load_data()
print(type(X_train))
print(X_train.shape, Y_train.shape, X_test.shape, Y_test.shape) #70000切成60000 10000
print(np.unique(Y_train, return_counts=True)) #有幾個幾
print(np.unique(Y_test, return_counts=True))
figure = plt.figure(figsize=(8,8))
cols, rows = 3,3
for i in range(1,cols*rows+1):
img, label = X_train[i-1], Y_train[i-1] #X_train是圖像 Y_train是解答
figure.add_subplot(rows,cols,i)
plt.title("label= {}".format(label))
plt.axis("off")
plt.imshow(img.squeeze(),cmap="gray")
plt.show()
# data preprocess #轉成可運算的
X_train = X_train.astype('float32') / 255
X_test = X_test.astype('float32') / 255
X_train = X_train.reshape(-1,28*28) #60000*784
X_test = X_test.reshape(-1,28*28)
Y_train = keras.utils.to_categorical(Y_train, num_classes = 10, dtype='float32')
Y_test = keras.utils.to_categorical(Y_test, num_classes = 10, dtype='float32')
# hyperparameter
lr = 0.001 #LEARNING RATE 越高代表新資料越重要,表示喜新厭舊(對於資料)
num_classes = 10
batch_size = 128 #過128筆資料之後一起LEARNING(更新)(每一批有128筆資料) 60000筆資料跑完算一次
num_epochs = 5 #60000筆資料全部共看5次
metrics = [keras.metrics.AUC(from_logits=False, name='AUC'), #留什麼觀察值
keras.metrics.Precision(name='Precision'),
keras.metrics.Recall(name='Recall'),
keras.metrics.CategoricalAccuracy(name='Cat_acc')]
optimizer = keras.optimizers.Adam(learning_rate=lr)
loss = keras.losses.CategoricalCrossentropy(from_logits=False)
def DNN1(input):
input_layer = keras.Input(shape=(input.shape[1],),name='Input') #shape這邊指的是784,表示接收的第一層為784
h = keras.layers.Dense(units=648,activation='relu',name='Hidden_1')(input_layer) #784個權重再給256,超級大的非線性方程式
output_layer = keras.layers.Dense(units=num_classes,activation='softmax',name='Outupt')(h)
model = keras.models.Model(inputs=input_layer,outputs=output_layer, name='DNN1')
model.compile(loss=loss,optimizer=optimizer,metrics=metrics)
return model
model1 = DNN1(X_train)
model1.summary() #(784*(256+1))+(256*(64+1))+(65+(10+1))=218058
history1 = model1.fit(X_train,Y_train,validation_split=0.2,epochs=num_epochs,batch_size=batch_size,shuffle=True) #validation_split=0.2 表示80%拿去建模
result1 = model1.evaluate(X_test,Y_test,batch_size=batch_size,return_dict=True) #shuffle代表打亂
result12 = model1.evaluate(X_train,Y_train,batch_size=batch_size,return_dict=True)
print(result1)
def DNN2(input):
input_layer = keras.Input(shape=(input.shape[1],),name='Input') #shape這邊指的是784,表示接收的第一層為784
h = keras.layers.Dense(units=400,activation='relu',name='Hidden_1')(input_layer) #784個權重再給256,超級大的非線性方程式
h = keras.layers.Dense(units=300,activation='relu',name='Hidden_2')(h) #後面那個()放的是input的層 784->256->64->10
h = keras.layers.Dense(units=200,activation='relu',name='Hidden_3')(h) #後面那個()放的是input的層 784->256->64->10
h = keras.layers.Dense(units=100,activation='relu',name='Hidden_4')(h) #後面那個()放的是input的層 784->256->64->10
output_layer = keras.layers.Dense(units=num_classes,activation='softmax',name='Outupt')(h)
model = keras.models.Model(inputs=input_layer,outputs=output_layer, name='DNN2')
model.compile(loss=loss,optimizer=optimizer,metrics=metrics)
return model
model2 = DNN2(X_train)
model2.summary() #(784*(256+1))+(256*(64+1))+(65+(10+1))=218058
history2 = model2.fit(X_train,Y_train,validation_split=0.2,epochs=num_epochs,batch_size=batch_size,shuffle=True) #validation_split=0.2 表示80%拿去建模
result2 = model2.evaluate(X_test,Y_test,batch_size=batch_size,return_dict=True) #shuffle代表打亂
result22 = model1.evaluate(X_train,Y_train,batch_size=batch_size,return_dict=True)
print(result2)
```
## Q2
> 使用Testing Data通過模型,將預測結果記錄成**Confusion Matrix**,並針對結果進行討論(E.g.分析哪些數字分類容易搞混,為什麼?)
可從此矩陣中看出,4跟9是最容易被搞混的

圖中的9的中間那塊,與4中間的那個三角形,如果將4的尖端連起來,再加上書寫的方式使的4以及9相當的像。

## Q3
> 從MNIST資料集選其中一個數字,訓練一個Auto Encoder。在Embedding Layer的二維平面上等間距取10*10個Embedding,將這些Embedding通過Decoder後的數字圖,**按照在Embedding Layer的分布畫圖**,並解釋兩個Dimension可能分別在Model什麼特徵?

* X軸 : 從右到左,可以看出慢慢變胖, 因此我們可以推斷它著重的特徵為左 右的寬度。
* Y軸 : 從上到下,可以看出中間的那塊 面積越來越多越來越明顯,我們可以 藉此推斷它著重的特徵或許為8的圓 圈的大小。
### Q2&Q3 Code
```python=
import tensorflow as tf
from tensorflow import keras
from keras.datasets import mnist
import numpy as np
import matplotlib
import matplotlib.pyplot as plt
from sklearn.metrics import confusion_matrix
from keras.models import Model
# load data
(X_train, Y_train), (X_test, Y_test) = mnist.load_data()
print(type(X_train))
print(X_train.shape, Y_train.shape, X_test.shape, Y_test.shape) #train建模 test測試 train本次 test作業 70000切成60000 10000
print(np.unique(Y_train, return_counts=True)) #有幾個幾
print(np.unique(Y_test, return_counts=True))
figure = plt.figure(figsize=(8,8))
cols, rows = 3,3
for i in range(1,cols*rows+1):
img, label = X_train[i-1], Y_train[i-1] #X_train是圖像 Y_train是解答
figure.add_subplot(rows,cols,i)
plt.title("label= {}".format(label))
plt.axis("off")
plt.imshow(img.squeeze(),cmap="gray")
plt.show()
# data preprocess #轉成可運算的
X_train = X_train.astype('float32') / 255
X_test = X_test.astype('float32') / 255
X_train = X_train.reshape(-1,28*28) #60000*784
X_test = X_test.reshape(-1,28*28)
Y_train = keras.utils.to_categorical(Y_train, num_classes = 10, dtype='float32')
Y_test = keras.utils.to_categorical(Y_test, num_classes = 10, dtype='float32')
# hyperparameter
lr = 0.001 #LEARNING RATE 越高代表新資料越重要,表示喜新厭舊(對於資料)
num_classes = 10 #
batch_size = 128 #過128筆資料之後一起LEARNING(更新)(每一批有128筆資料) 60000筆資料跑完算一次
num_epochs = 5 #60000筆資料全部共看5次
metrics = [keras.metrics.AUC(from_logits=False, name='AUC'), #留什麼觀察值
keras.metrics.Precision(name='Precision'),
keras.metrics.Recall(name='Recall'),
keras.metrics.CategoricalAccuracy(name='Cat_acc')]
optimizer = keras.optimizers.Adam(learning_rate=lr)
loss = keras.losses.CategoricalCrossentropy(from_logits=False)
def DNN(input):
input_layer = keras.Input(shape=(input.shape[1],),name='Input') #shape這邊指的是784,表示接收的第一層為784
h = keras.layers.Dense(units=256,activation='relu',name='Hidden_1')(input_layer) #784個權重再給256,超級大的非線性方程式
h = keras.layers.Dense(units=64,activation='relu',name='Hidden_2')(h) #後面那個()放的是input的層 784->256->64->10
output_layer = keras.layers.Dense(units=num_classes,activation='softmax',name='Outupt')(h)
model = keras.models.Model(inputs=input_layer,outputs=output_layer, name='DNN')
model.compile(loss=loss,optimizer=optimizer,metrics=metrics)
return model
model = DNN(X_train)
model.summary()
history = model.fit(X_train,Y_train,validation_split=0.2,epochs=num_epochs,batch_size=batch_size,shuffle=True) #validation_split=0.2 表示80%拿去建模
result = model.evaluate(X_test,Y_test,batch_size=batch_size,return_dict=True) #shuffle代表打亂
print(result)
#q2
Y_predict = model.predict(X_test, batch_size=batch_size, verbose=0, steps=None)
Y_pred = np.argmax(Y_predict,-1)
Y_true = np.argmax(Y_test,-1)
cm = confusion_matrix(Y_true, Y_pred)
array = np.zeros((len(Y_true), len(Y_true)))
diff = Y_pred-Y_true
for i in range(len(diff)):
if diff[i] != 0:
array[Y_pred[i]][Y_true[i]] += 1;
max_diff = np.where(array == np.max(array))
print(np.where(array == np.max(array)))
record = []
for i in range(len(diff)):
if Y_pred[i] == max_diff[0] and Y_true[i] == max_diff[1] and diff[i] != 0:
record.append(i)
(X_train, Y_train), (X_test, Y_test) = mnist.load_data()
figure = plt.figure(figsize=(8,8))
cols, rows = 3,3
for i in range(1, cols*rows+1):
img, label = X_test[record[i-1]], Y_test[record[i-1]]
figure.add_subplot(rows,cols, i)
plt.title("label= {}".format(label))
plt.axis("off")
plt.imshow(img.squeeze(), cmap="gray")
plt.show()
# AE
# hyperparameter
lr = 0.001 #LEARNING RATE 越高代表新資料越重要,表示喜新厭舊(對於資料)
batch_size = 128 #過128筆資料之後一起LEARNING(更新)(每一批有128筆資料) 60000筆資料跑完算一次
num_epochs = 5 #60000筆資料全部共看5次
metrics = []
code_dim = 2
optimizer = keras.optimizers.Adam(learning_rate=lr)
loss = keras.losses.BinaryCrossentropy(from_logits=False)
(X_train, Y_train), (X_test, Y_test) = mnist.load_data()
# chose 6
for i in range(len(Y_test)):
if Y_test[i]==8:
fir=i
break
# data preprocess
X_train = X_train.astype('float32') / 255
X_test = X_test.astype('float32') / 255
X_train = X_train.reshape(-1,28*28)
X_test = X_test.reshape(-1,28*28)
Y_train = keras.utils.to_categorical(Y_train, num_classes = 10, dtype='float32')
Y_test = keras.utils.to_categorical(Y_test, num_classes = 10, dtype='float32')
def auto_encoder(input):
input_layer = keras.Input(shape=(input.shape[1],),name='Input')
# Encoder
e_h = keras.layers.Dense(units=256,activation='relu',name='Encoder_1')(input_layer)
e_h = keras.layers.Dense(units=64,activation='relu',name='Encoder_2')(e_h)
# code
code = keras.layers.Dense(units=code_dim, activation='relu',name='Code')(e_h)
# Decoder
d_h = keras.layers.Dense(units=64,activation='relu',name='Decoder_1')(code)
d_h = keras.layers.Dense(units=256, activation='relu',name='Decoder_2')(d_h)
output_layer = keras.layers.Dense(units=input.shape[1], activation='sigmoid',name='Output')(d_h)
encoder = keras.models.Model(inputs=input_layer, outputs=code, name='Encoder')
decoder = keras.models.Model(inputs=code, outputs=output_layer, name='Decoder')
ae = keras.models.Model(inputs=input_layer,outputs=output_layer, name='AE')
ae.compile(loss=loss, optimizer=optimizer)
return ae, encoder, decoder
model_ae, model_encoder, model_decoder = auto_encoder(X_train)
model_ae.summary()
model_encoder.summary()
model_decoder.summary()
model_ae.fit(X_train, X_train, epochs=num_epochs, batch_size=batch_size, shuffle=True)
# check model weight
print(model_ae.get_layer('Code').get_weights()[0].shape)
print(model_ae.get_layer('Code').get_weights()[0][:1])
print(model_encoder.get_layer('Code').get_weights()[0].shape)
print(model_encoder.get_layer('Code').get_weights()[0][:1])
# Encode
img_pred = model_encoder.predict(X_test)
print(img_pred.shape)
print(img_pred[:3])
# Decode
img_reconstruct = model_decoder.predict(img_pred)
print(img_reconstruct.shape)
figure = plt.figure(figsize=(20,4))
display_num = 10
cols, rows = display_num, 2
for i in range(1, cols*rows+1):
figure.add_subplot(rows, cols, i)
plt.axis('off')
if i < display_num+1:
plt.title(Y_test[i-1].argmax())
plt.imshow(X_test[i-1].reshape(28, 28), cmap='gray')
else:
plt.imshow(img_reconstruct[i-1-display_num].reshape(28,28), cmap='gray')
plt.show()
#q3
numbers = np.zeros((100,2))
xx = img_pred[fir][0]
yy = img_pred[fir][1]
zz = 3
for i in range(0,100,10):
for j in range(10):
numbers[i+j][0]=xx
numbers[i+j][1]=yy+zz*j
if(j==9):
xx=xx+zz
numbers_pre = model_decoder.predict(numbers)
figure = plt.figure(figsize=(16,16))
cols, rows = 10,10
for i in range(1,cols*rows+1):
figure.add_subplot(rows,cols,i)
plt.title("num = 8")
plt.axis("off")
plt.imshow(numbers_pre[i-1].reshape(28,28), cmap='gray')
plt.show()
```
## Reference
* [心理學和機器學習中的 Accuracy、Precision、Recall Rate 和 Confusion Matrix](https://chingtien.medium.com/%E5%BF%83%E7%90%86%E5%AD%B8%E5%92%8C%E6%A9%9F%E5%99%A8%E5%AD%B8%E7%BF%92%E4%B8%AD%E7%9A%84-accuracy-precision-recall-rate-%E5%92%8C-confusion-matrix-529d18abc3a)
* [Keras.metrics中的accuracy总结](https://zhuanlan.zhihu.com/p/95293440)