〈從 Numpy 開始的 Neural Network #1 〉全連接層 Fully Connected Layer / Neural Network from Scratch

# 〈從 Numpy 開始的 Neural Network #1 〉全連接層 Fully Connected Layer / Neural Network from Scratch --- Tu 2023/4/21 ***“What I cannot create, I do not understand”* -Richard Feynman** ![](https://i.imgur.com/E700R3B.jpg) ## 一、前言期中考完來開新系列文，因為室友剛好選到一堂類似的課，我也索性一起來玩玩看。過程中我發現其實完整整理這類內容的文章不多，不然就是不夠清楚，所以我會盡可能整理我一路上找到的學習資源，讓看完這篇的人有能力把FCNN完全理解然後自己實作出來(當然要對ML有一定的理解)。這個系列主要就是嘗試只用numpy就把整個神經網路手刻出來，因此會包含到一些數學和design pattern(希望可以啦)。我一向是不太喜歡手刻的(誰喜歡)，我在這肝個兩天寫出來的東西可能別人用pytorch或tensorflow兩分鐘就能寫出更好的，那我這是何苦呢？但看到Richard Feynman上面那句話後，我就想來嘗試看看這樣究竟能獲得什麼收穫，說不定在機器學習這條路上，看起來最崎嶇的小徑反而是最快的也說不定。基於時間考量和我有其他想做的專題，這個系列可能不會有太多集(也可能因為心血來潮就只有這一集)，總之先做再說。 ## 二、全連接層(Fully Connected Layer)簡介簡單來說就是由數個神經元(neurons)組成的層，每一個neuron會針對輸入執行一個線性的運算，如下圖 ![](https://i.imgur.com/5LoLD1m.jpg) 圖中Sigma是指activation function，其他更基本的東西請自己上網查，這邊就不多提了。 ## 三、用Numpy實作接下來才是重頭戲，我會盡量依下面的步驟一步一步實作，方便大家理解。 1. 全連接層與神經網路的建置 Initialization 2. 前向傳遞實作 Forward Propagation 3. 反向傳遞的數學概念 Backward Propagation 4. 反向傳遞實作 5. 模型訓練 Training Loop 我程式碼架構是參考[這篇文章](https://towardsdatascience.com/coding-a-neural-network-from-scratch-in-numpy-31f04e4d605)，基本上只有起初建構的地方相同，文章解釋得很清楚，但數學的部分沒有解釋notation，對我這種新手很不友好，但仍然是很不錯的學習資源。有些主題如果我有找到好的資料，我會直接整理出來，畢竟老調重彈沒什麼意義。 ### 全連接層與神經網路的建置 Initialization 這部分就相對輕鬆寫意，直接上程式碼 ```python #全連接層 class DenseLayer: def __init__(self, input_shape, output_shape, activation=None): self.type = 'Dense_Layer' self.input_shape = input_shape self.output_shape = output_shape self.activation = activation ``` 這邊就叫DenseLayer了，反正本來就有很多不同叫法(錯了請多多指正) 接下來是神經網路的部分 ```python class Network: def __init__(self): self.network = [] self.params = [] self.history = [] self.gradients = [] def add(self, layer): self.network.append(layer) def structure(self): for idx, layer in enumerate(self.network): print(f'{layer.type} ({idx}) ---> input_size: {layer.input_shape}, output_size: {layer.output_shape}') ``` 這邊稍微講解一下： * self.network儲存DenseLayer Object * self.params儲存不同層的weights、bias和activation function * self.history紀錄每層的input、output(Z和A) * self.gradient紀錄每層的gradient 執行以下程式碼： ```python model = Network() model.add(DenseLayer(2,3,'relu')) model.add(DenseLayer(3,1,'relu')) model.structure() ``` 執行結果： Dense_Layer (0) ---> input_size: 2, output_size: 3 Dense_Layer (1) ---> input_size: 3, output_size: 1 等同於建構了一個2-3-1的神經網路 ![](https://i.imgur.com/1KaGvTh.jpg) ### 前向傳遞實作 Forward Propagation 這個其實也算輕鬆(相比於後面的大魔王)，數學的部分也相對簡單，可以上網查一下(就是普通的矩陣運算) 一樣，直接上程式碼 ```python class Network: def __init__(self): self.network = [] ## layers #self.architecture = [] ## mapping input neurons --> output neurons self.params = [] #{W, b ,activation} self.history = [] ## Z, A self.gradients = [] ## dW, db def add(self, layer): self.network.append(layer) def structure(self): for idx, layer in enumerate(self.network): print(f'{layer.type} ({idx}) ---> input_size: {layer.input_shape}, output_size: {layer.output_shape}') def init_weight(self): for layer in self.network: weight = np.random.rand(layer.input_shape, layer.output_shape)*2-1 self.params.append({ 'W':weight, 'b':np.random.rand(1, layer.output_shape), 'activation':layer.activation }) def forward(self, x): for idx, param in enumerate(self.params): record = {'idx': idx, 'W': param['W'], 'b': param['b'], 'activation': param['activation']} record['inputs'] = x #record input x = x@param['W']+param['b'] record['Z'] = x #record z if param['activation']: if param['activation'] == 'relu': x = Functions.relu(x) record['A'] = x #record A self.history.append(record) return x ``` 這邊對Network的class加了兩個方法：init_weight和forward。 * init_weight：隨機初始化每層的weight和bias，其值為np.array的矩陣 * forward：針對輸入的input做前向傳遞，並回傳結果 forward裡還有record這個變數，其資料型態是dictionary，紀錄這個layer中的各種運算結果以供後續back propagation計算使用。真正重要的幾個只有： * Z：x*w+b矩陣運算的結果 * A：把Z經過activation function運算後的結果關於Activation function的部分我有額外定義一個class方便整理，如下： ```python class Functions: @staticmethod def MSE(pred, y): J = sum((y-pred)**2)/len(pred) return J @staticmethod def relu(x): return np.maximum(0, x) @staticmethod def relu_derivative(x): x[x>0] = 1 return np.maximum(0,x) ``` 這邊是這篇文章會用到的所有函數了，其他的後面再講。老實說我覺得這個forward我寫的又醜又沒效率，如果有更好的寫法歡迎多多指教。執行以下程式碼： ```python model = Network() model.add(DenseLayer(2,3,'relu')) model.add(DenseLayer(3,1,'relu')) model.structure() model.init_weight() input = np.random.randn(5,2) print('output:') print(model.forward(input)) ``` 執行結果： ``` Dense_Layer (0) ---> input_size: 2, output_size: 3 Dense_Layer (1) ---> input_size: 3, output_size: 1 output: [[0.86601944] [0.17616539] [0. ] [0.15469798] [0.57608506]] ``` ### 反向傳遞的數學概念 Backward Propagation 這部分是最麻煩的，這邊先直接丟幾篇我覺得很棒的資源。 1. [文章：推，很詳細的介紹從簡單到複雜的反向傳遞運算](https://towardsdatascience.com/the-maths-behind-back-propagation-cf6714736abf) 2. [影片：還好，但在dimension的解釋上蠻清楚的](https://www.youtube.com/watch?v=w8yWXqWQYmU) 3. [影片：不多說，必看](https://www.youtube.com/watch?v=GlcnxUlrtek) 我相信會想做這種事的人應該對backward propagation多少有些理解，但可能不是很全面，因此我在這邊提醒：**懂概念和能實作是兩回事**。我只能說這部分讓我充分體會到什麼叫"What I cannot create I don't understand" 真的要有很透徹的理解你才能實際把他變成程式碼。接下來我就盡可能的手寫解釋 ![](https://i.imgur.com/sFVeDB0.jpg) ![](https://i.imgur.com/ev4SKzV.jpg) 這邊對我比較有用處的應該是下面對size的計算整理，檢查方法就是最後每層的gradient要和該層weights的size相同。補充一下notation * W[L]：第L層的weight * σ()：activation function * 𝛿：該層A以外的其他運算結果(可以用來計算bias gradient) 如果有其他不清楚的地方，歡迎留言問我，或者查看上方的資源。關於bias gradient的計算相對簡單很多，是把delta的每一行加在一起(詳細看後面的實作程式碼) ### 反向傳遞實作 ```python # 把這兩個方法加到 Network Class def back_prop(self, pred, y): m = len(pred) delta = (-2/m)*(y - pred) for idx in range(len(self.network))[::-1]: #1,0 if idx!=len(self.network)-1: #如果不是最後一層 delta = np.dot(delta, self.params[idx+1]['W'].T) sigma_prime_Z = Functions.relu_derivative(self.history[idx]['Z']) delta = np.multiply(delta, sigma_prime_Z) w_gradient = np.dot(self.history[idx]['inputs'].T, delta) b_gradient = np.sum(delta, axis=0, keepdims=True) self.gradients.insert(0, {'Wg':w_gradient, 'bg':b_gradient}) def step(self, lr = 0.01): for idx, param in reversed(list(enumerate(self.params))): newW = param['W'] - lr*self.gradients[idx]['Wg'] newb = param['b'] - lr*self.gradients[idx]['bg'] self.params[idx]['W'] = newW self.params[idx]['b'] = newb ``` * back_prop是用來計算每一層的梯度，會將結果記錄到self.gradients裡面 * step則是對每一層的參數進行一次更新 *另外其實每一層的activation function有點半強制的設為relu，我寫完後發現我的backprop方法裡沒有針對其他狀況的時候應該做出的反應。算是設計瑕疵(但我也懶得改了)，請使用時多加注意。* 測試執行以下程式碼： ```python model = Network() model.add(DenseLayer(2,3,'relu')) model.add(DenseLayer(3,1,'relu')) model.init_weight() input = np.random.randn(5,2) y = np.random.randn(5,1) output = model.forward(input) print(f'MSE: {Functions.MSE(output, y)}') model.back_prop(output, y) model.step(lr = 0.01) output = model.forward(input) print(f'MSE: {Functions.MSE(output, y)}') ``` 執行結果： ``` MSE: [1.78024903] MSE: [1.49469603] ``` MSE下降了！代表我們的gradient descent正常運行。 ### 模型訓練 Training Loop 最後這個就頗輕鬆了 ```python def train_loop(): for i in range(5): output = model.forward(input) model.back_prop(output, y) model.step(lr = 0.01) print(f'MSE: {Functions.MSE(output, y)}') ``` 執行結果： ```python train loop: MSE: [1.49478041] MSE: [1.38679942] MSE: [1.28652727] MSE: [1.19383314] MSE: [1.10858741] ``` ## 四、結語這篇最大的遺憾就是沒能把這個Network用實際的kaggle資料測試看看，簡單來說：**不是實戰規格**。因此可能拿去實際測試會產生各種bug，如果有還請多多指教。其實每一層的activation function有點半強制的設為relu，我寫完後發現我的back_prop方法裡沒有針對其他狀況的時候應該做出的反應。算是設計瑕疵(但我也懶得改了)，請使用時多加注意。 ###### tags: `AI` `Deep Learning` `NNfromScratch`