---
title: 'Learning 8-bit parity checking problem with MLP'
disqus: hackmd
---
###### tags: `410621240 資工三 徐翊峰`
# Learning 8-bit parity checking problem with MLP
---
#### Table of Contents
---
[TOC]
## 問題描述
---
- In this assignment, you are required to design a multilayer perceptron (MLP) to learn the 8-bit parity check (8BPC) problem. The 8BPC problem is to generate a single parity bit from an 8-bit (1-byte) binary input. The generated parity bit must be one if the number of 1’s in the binary input is odd. Otherwise, the generated parity bit must be zero.
- 產生8位元的測試資料包含**0**與**1**,當有奇數個**1**時label為**1**,當有偶數個**1**時label為**0**。
## 問題解決步驟
---
### 1. 產生訓練資料
- Prepare the training data which must contain 256 binary inputs and there corresponding outputs (the generated parity bit).
- **data**的部分利用數字由十進位轉換成二進位的方式並且佔滿8個位元缺項補0,之後把每個位元轉成int存到list裡,至於**label**的部分則是利用list的count指令來計算有幾個1就可以產生,總共由0產生到255。
```python=
def data_generator():#產生測試資料和對應的label
data = []
label = []
for i in range(0,256):
num = list("{:08b}".format(i))
num = list(map(int, num))
data.append(num)
label.append([1 if num.count(1)%2 ==1 else 0])
return np.array(data), np.array(label)
```
### 2. 產生MLP neural network所需要用的class
- 這次我們使用的是linear layer來當作每一層神經網路的權重計算。
- $$o_j = (x \cdot W + b )$$
- Linear layer實作:
```python=
class Linear: #linear layer function
def __init__(self, m, n):
#使用標準隨機分布產生weight(m,n)矩陣和bias(1,n)矩陣並把每個值除以m
self.W, self.b = np.random.randn(m, n)/m, np.random.randn(1, n)/m #mxn, 1xn
self.dW, self.db = None, None
def forward(self, x):
self.x = x
out = np.dot(x, self.W) + self.b
return out
def backward(self, dout):
dx = np.dot(dout, self.W.T)
self.dW = np.dot(self.x.T, dout)
self.db = np.sum(dout, axis = 0)
return dx
```
- 在每層神經網路後的activation function
- ReLU
$$ f(x)=\left\{
\begin{aligned}
0 \; for \; x < 0 \\
x \; for \; x \geq 0\\
\end{aligned}
\right.
$$
- Sigmoid
$$\sigma(x) = \frac{1}{1+e^{-x}}$$
- tanh
$$ tanh(x) = \frac{\exp^{x}-\exp^{-x}}{\exp^{x}+\exp^{-x}}$$
- ReLU實作:
```python=
class ReLU: #relu activation function
def __init__(self):
pass
def forward(self, x):
self.mask = (x<=0)
out = x
out[out<=0] = 0
return out
def backward(self, dout):
dx = dout
dx[self.mask] = 0
return dx
```
- Sigmoid實作:
```python=
class Sigmoid: #sigmoid activation function
def __init__(self):
pass
def forward(self, x):
out = 1/(1+np.exp(-x))
self.o = out
return out
def backward(self, dout):
dx = dout*self.o*(1-self.o)
return dx
```
- Tanh實作:
```python=
class Tanh: #tanh activation function
def __init__(self):
pass
def forward(self, x):
out = (np.exp(x)-np.exp(-x))/(np.exp(x)+np.exp(-x))
self.o = out
return out
def backward(self, dout):
dx = dout*(1-(self.o**2))
return dx
```
- 最後再加上Loss funtion來調整神經元的參數
- Loss實作:
```python=
class Loss: #loss function:Mean-Square Error
def __init__(self):
pass
def forward(self, y, ybar):
self.ybar = ybar
return np.sum((y-ybar)**2)
def backward(self, dout):
dy = -(2*(y-self.ybar))
return dy
```
### 3. 實作MLP neural network
- 可利用上面所建的神經層與激活函數來實現多層神經網路
- MLP實作:
```python=
class MLP: #construct a MLP with N layers
def __init__(self, input, act_fun, neuron):
self.linear = []
self.act = []
self.last_dW = []
self.last_db = []
self.neuron = [input] + neuron
self.act_fun = act_fun
self.layer_num = len(act_fun)
for i in range(self.layer_num):
self.linear.append(Linear(self.neuron[i], self.neuron[i+1]))
if self.act_fun[i] == 'relu': self.act.append(ReLU())
if self.act_fun[i] == 'sigmoid': self.act.append(Sigmoid())
if self.act_fun[i] == 'tanh': self.act.append(Tanh())
if self.act_fun[i] == 'linear': self.act.append(None)
self.last_dW.append(0)
self.last_db.append(0)
self.loss = Loss()
def forward(self, x):
for i in range(self.layer_num):
x = self.linear[i].forward(x)
if self.act[i]: x = self.act[i].forward(x)
self.ybar = x
return self.ybar
def backward(self, y):
self.L = self.loss.forward(y, self.ybar)
g = self.loss.backward(1)
for i in range(self.layer_num - 1, -1, -1):
if self.act[i]: g = self.act[i].backward(g)
g = self.linear[i].backward(g)
def update(self, eta, alpha):
for i in range(self.layer_num):
self.linear[i].W = self.linear[i].W - eta*self.linear[i].dW + alpha*self.last_dW[i]
self.linear[i].b = self.linear[i].b - eta*self.linear[i].db + alpha*self.last_db[i]
self.last_dW[i] = eta*self.linear[i].dW
self.last_db[i] = eta*self.linear[i].db
```
### 4. 訓練神經網路
- 將產生的data與label丟到自訂好的model做訓練
```python=
x, y = data_generator()
model = MLP(8, ['relu', 'sigmoid'], [256, 1])#設定model層數與neuron
max_epochs, chk_epochs = 30000,3000#訓練次數和訓練確認點
eta, alpha = 0.01, 0.7
loss_value = []
chk_epochs_value = []
for e in range(max_epochs): #train model
model.forward(x)
model.backward(y)
model.update(eta, alpha)
if(e+1)%chk_epochs ==0:
accuracy = count_acc(model.ybar, y)
print('Epoch %3d: loss=%.6f accuracy=%.4f'%(e+1, model.L, accuracy))
loss_value.append(model.L)
chk_epochs_value.append(e+1)
```
### 5. 畫出訓練次數與誤差
- 使用**python**的**matplotlib**來畫出**Training Error**與**Epochs**的關係
```python=
import matplotlib.pyplot as plt
plt.plot(chk_epochs_value,loss_value,label= "Loss")
plt.grid()
plt.legend()
plt.xlabel("Epochs")
plt.ylabel("Training Error")
plt.title("learning curve")
plt.show()
```
- 之後就可以看到成果

## 實驗結果比較
---
### 1. 二層架構成果
```flow
st=>operation: ReLU
e=>operation: Sigmoid
st(right)->e
```
```python=
#2 layers
model = MLP(8, ['relu', 'sigmoid'], [256, 1])
max_epochs, chk_epochs = 30000,3000
eta, alpha = 0.01, 0.7
```

### 2. 三層架構成果
```flow
st=>operation: ReLU
op1=>operation: Tanh
e=>operation: Sigmoid
st(right)->op1(right)->e
```
```python=
#3 layers
model = MLP(8, ['relu', 'tanh', 'sigmoid'], [30, 10, 1])
max_epochs, chk_epochs = 30000,3000
eta, alpha = 0.01, 0.7
```

### 3. 四層架構成果
```flow
st=>operation: ReLU
op1=>operation: Tanh
op2=>operation: ReLU
e=>operation: Sigmoid
st(right)->op1(right)->op2(right)->e
```
```python=
#4 layers
model = MLP(8, ['relu', 'tanh', 'relu', 'sigmoid'], [25, 20, 4, 1])
max_epochs, chk_epochs = 30000,3000
eta, alpha = 0.01, 0.7
```

## 心得與結論
---
- 在我建完model後,不斷的調整裡面神經元與每一層activation function的過程中
- 我發現其實**ReLU**是一個很有效幫助降低這次題目**Loss**的一個函數。
- 除此之外神經元的初始值也不能給的太大或太小,都會影響到整個model的學習。
- 還有關於*eta*跟*alpha*的初始值設定我認為eta不能太大否則model的學習效果會很差,alpha則是可以大一點。
- 還有在整體越少的層數中神經元的使用就要比較多,如果很多層的神經網路的話就可以不用那麼多的神經元,我在猜想可能是本身這個問題的結果並不是那麼複雜所以就不需要有那麼大量的參數來做運算,如果太多的話反而會影響結果,與運算速度。
> [name=徐翊峰]