# Assignment 1 of Introductory Deep Learning ###### Learning 8-bit parity checking problem with MLP *** # Table of contents [TOC] # Generate training data >The generated parity bit must be one if the number of 1’s in the binary input is odd. Otherwise, the generated parity bit must be zero. ```python= import numpy as np import matplotlib.pyplot as plt #創建256*8之測試資料,隨機產生8-bit binary input data = np.random.randint(0, 2, (256,8)) #創建256*1之資料答案,以0填充之矩陣 data_ans = np.zeros((256,1), dtype=np.int) #找出測試資料對應的答案 for i in range(256): count = 0 for j in range(8): if data[i][j] == 1: count += 1 if count% 2 == 1: data_ans[i] = 1 else : data_ans[i] = 0 ``` # Activation functions >We need to use the activation function in the neural network, otherwise the output and input cannot be separated from the linear relationship. * Linear ```python= class Linear: def __init__(self, m, n): self.W, self.b = np.random.randn(m, n)/m, np.random.randn(1, n)/m #mxn, 1xn self.dW, self.db = None, None #None, None def forward(self, x): self.x = x #kxm return np.dot(x, self.W) + self.b #k*mxm+1=kx1 def backward(self, dout): dx = np.dot(dout, self.W.T) #kx1*(mx1)'=kxm self.dW = np.dot(self.x.T, dout) self.db = np.sum(dout, axis=0) return dx ``` * Relu ```python= class Relu: def __init__(self): pass def forward(self, x): self.x = x return np.maximum(0, x) #if x>=0, return x; else, return 0 def backward(self, dout): dx = dout * (self.x >= 0) #dout=1 return dx ``` * Tanh ```python= class Tanh: def __init__(self): pass def forward(self, x): self.out = np.tanh(x) return np.tanh(x) #可以使用return (np.exp(x)-np.exp(-x))/(np.exp(x)+np.exp(-x)) def backward(self, dout): return dout * (1 - (self.out**2)) ``` * Sigmoid ```python= class Sigmoid: def __init__(self): pass def forward(self, x): out = 1 / (1 + np.exp(-x)) #kxn self.out = out return out def backward(self, dout): return dout * self.out * (1 - self.out) #kxn ``` * Loss ```python= class Loss: def __init__(self): pass def forward(self, y, ybar): self.ybar = ybar #1xn self.y = y return np.sum((y-ybar)**2) #1x1 def backward(self, dout): dy = -(2 * (self.y - self.ybar)) #1xn return dy ``` # Generate MLP >This is my code that I how to generate MLP. ```python= class MLP: def __init__(self, num, layer, neurons): self.act = [] self.linear = [] self.last_dW = [] self.last_db = [] self.loss = Loss() self.length = len(layer) #length為層數 self.count = 0 #count在forward及backward中控制陣列 #last_dW及last_db初始值為0 for i in range(self.length): self.last_dW.append(0) self.last_db.append(0) #Activation function for i in range(self.length): if i == 0: self.linear.append(Linear(num, neurons[0])) else: self.linear.append(Linear(neurons[i-1], neurons[i])) if layer[i] == 'relu': self.act.append(Relu()) elif layer[i] == 'tanh': self.act.append(Tanh()) elif layer[i] == 'sigmoid': self.act.append(Sigmoid()) def forward(self, x): self.count = 0 while self.count < self.length-1: x = self.linear[self.count].forward(x) x = self.act[self.count].forward(x) self.count+=1 x = self.linear[self.length-1].forward(x) self.ybar = self.act[self.length-1].forward(x) return self.ybar def backward(self, y): self.L = self.loss.forward(y, self.ybar) g = self.loss.backward(1) self.count = self.length-1 while self.count >= 0: g = self.act[self.count].backward(g) g = self.linear[self.count].backward(g) self.count-=1 return g def updata(self, eta, alpha): for i in range(self.length): self.linear[i].W = self.linear[i].W - eta*self.linear[i].dW + alpha*self.last_dW[i] self.linear[i].b = self.linear[i].b - eta*self.linear[i].db + alpha*self.last_db[i] self.last_dW[i] = eta * self.linear[i].W self.last_db[i] = eta * self.linear[i].db ``` # Training and loss value ## Two Layer * **Layer** The 1st layer is a linear layer of 16 neurons with the Tanh() activation function. The 2nd layer is a linear layer of 1 neuron with the Sigmoid() activation function. * **Parameter** last_dW, last_db = 0.3, 0.3 eta, alpha = 0.01, 0.01 ```python= model = MLP(8, ['tanh','sigmoid'], [16, 1]) max_epochs, chk_epochs = 15000, 1000 last_dW, last_db = 0.3, 0.3 eta, alpha = 0.01, 0.01 ``` * **Loss value** >A learning curve is an X-Y plot showing the loss value (Y-axis) obtained in each epoch (X-axis). >Max epochs is 15000 and each 1000 check once. > ![](https://i.imgur.com/vGQGNkW.png) ## Three Layer * **Layer** The 1st layer is a linear layer of 16 neurons with the Tanh() activation function. The 2nd layer is a linear layer of 4 neurons with the Relu() activation function. The 3rd layer is a linear layer of 1 neuron with the Sigmoid() activation function. * **Parameter** last_dW, last_db = 0.3, 0.3 eta, alpha = 0.01, 0.01 ```python= model = MLP(8, ['tanh','relu','sigmoid'], [16, 4, 1]) max_epochs, chk_epochs = 15000, 1000 last_dW, last_db = 0.3, 0.3 eta, alpha = 0.01, 0.01 ``` * **Loss value** >A learning curve is an X-Y plot showing the loss value (Y-axis) obtained in each epoch (X-axis). >Max epochs is 15000 and each 1000 check once. > ![](https://i.imgur.com/E7SrCD8.png) ## Four Layer * **Layer** The 1st layer is a linear layer of 16 neurons with the Tanh() activation function. The 2nd layer is a linear layer of 16 neurons with the Tanh() activation function. The 3rd layer is a linear layer of 8 neurons with the Relu() activation function. The 4th layer is a linear layer of 1 neuron with the Sigmoid() activation function. * **Parameter** last_dW, last_db = 0.5, 0.5 eta, alpha = 0.01, 0.01 ```python= model = MLP(8, ['tanh','tanh','relu','sigmoid'], [16, 16, 8, 1]) max_epochs, chk_epochs = 15000, 1000 last_dW, last_db = 0.5, 0.5 eta, alpha = 0.01, 0.01 ``` * **Loss value** >A learning curve is an X-Y plot showing the loss value (Y-axis) obtained in each epoch (X-axis). >Max epochs is 15000 and each 1000 check once. > ![](https://i.imgur.com/vHPPrfq.png) # Compared performances From loss value of my training results, the more layers do perform best, so as long as the adjustment is good, the results will be completely different. ![](https://i.imgur.com/9rfaFfX.png) # Outcome discussions When the effect of a model is not good, I will check the performance of the model on the training data first. There are two possibilities. One is that the parameter and the number of neurons are not adjusted well. The second is that the activation function is not set well so result in poor training data. The model can be trained better by changing the activation function or adjusting the parameters and the number of neurons. The learning rate (eta) decides the update amplitude of the parameters in the model, and the number of layers will also lead to differences in training results. # Conclusions In this assignment, I only use Numpy to design a multilayer perceptron with two, three, and four layers to learn the 8-bit parity problem, and collect the loss values obtained in all epochs and the loss value of my outcome is almost 0. # Bonus work >In my program, anyone can use two lists obtain the activation functions and the number of neurons in the layers. > ![](https://i.imgur.com/JCcwqwM.png)