6_5_周哲煒 - HackMD

6.5 === [TOC] # 典型神經網路結構 ~~(歷史故事~~ - 1989年: Yann LeCun 用反向傳播演算法訓練多層神經網路辨識手寫郵遞區號-->LeNet - 1998年: Yann LeCun 提出LeNet-5-->卷積神經網路正式誕生 :::info 但當時硬體運算資源有限，卷積神經訓練花太多時間和機器--->沒被廣泛應用 ::: - 2012年: Alex Krizhevsky用GPU實現AlexNet，獲得ImageNet冠軍-->深度卷積神經網路受重視，開始迅速發展 - 後續顏敬人員又提出了多種的神經網路結構如VCG、inception、RNN、Transformer ## 6.5.1 Lenet-5 Lenet-5 採用先卷積再池化的模式，最後透過softmax函數取得結果 LaNet-5的結構: :arrow_down: ```python= class LaNet_5(nn.Module): #inheritance from nn.Module def __init__(self): #constructor super(CNN,self).__init__() # call nn.Moudle's constructor self.conv1=nn.Conv2d(1,6,5,1) #convolution layer, in channel=1(gray), out channel=6, kernal_size=5*5, stride=1 self.pool1=nn.AvgPool2d(2,stride=2) #max pooling, kernal size=2 self.conv2=nn.Conv2d(6,16,5,1) #convolution layer self.pool2=nn.AvgPool2d(2,stride=2) # max pooling self.fc1=nn.Linear(16*5*5,120) #fully connected layer, 16(16 channel),5*5(data size after pooling) self.fc2=nn.Linear(120,84) self.fc3(84,10) def forward(self,x): #how data pass from different layer #32*32*1 x=self.conv1(x) #28*28*6 x=self.pool1(x) #14*14*6 x=self.conv2(x) #10*10*16 x=self.pool2(x) #5*5*16 x=x.view(x.size(0),-1) #400*1 x=self.fc1(x) #120*1 x=self.fc2(x) #84*1 x=self.fc3(x) #10*1 output=F.softmax(x) return torch.max(output,1)[1].cpu().data.numpy(),x ``` ## 6.5.2 AlexNet - 由Alex Krizhevsky提出，獲得2012年ImageNet冠軍，將Top-5錯誤率降低了 10多% - 用CUDA GPU實現平行化的神經網路演算法，證明在**合理**的時間內訓練深度神經網路成為可能 AlexNet結構: :arrow_down: ```python= class AlexNet(nn.Module): #inheritance from nn.Module def __init__(self): #constructor super(CNN,self).__init__() # call nn.Moudle's constructor self.conv1=nn.Conv2d(3,96,11,4,2) #convolution layer, in channel=3(RGB), out channel=96, kernal_size=11*11, stride=4, padding=2(add 2 layer of "0") self.pool1=nn.MaxPool2d(3,stride=2) #max pooling, kernal size=2 self.conv2=nn.Conv2d(96,256,5,1,2) #convolution layer self.pool2=nn.MaxPool2d(3,stride=2) # max pooling self.conv3=nn.Conv2d(256,384,3,1,1) self.conv4=nn.Conv2d(256,384,3,1,1) self.conv5=nn.Conv2d(294,256,3,1,1) self.pool3 = nn.MaxPooling(3,stride=2) self.fc1=nn.Linear(256*6*6,4096) #fully connected layer, 32(32 channel),7*7(data size after pooling) self.fc2=nn.Linear(4096,4096) self.fc3=nn.Linear(4096,1000) def forward(self,x): #how data pass from different layer #227*227*3 x=self.conv1(x) #96*55*55 x=nn.ReLU(x) x=self.pool1(x) #96*27*27 x=self.conv2(x) #256*27*27 x=nn.ReLU(x) x=self.pool2(x) #256*13*13 x=self.conv3(x) #384*13*13 x=nn.ReLU(x) x=self.conv4(x) #384*13*13 x=nn.ReLU(x) x=self.conv5(x) #256*13*13 x=nn.ReLU(x) x=self.pool3(x) #256*6*6 x=x.view(x.size(0),-1) #9216*1 x=self.fc1(x) #4096*1 x=self.fc2(x) #4096*1 x=self.fc3(x) #1000*1 output=F.softmax(x) return torch.max(output,1)[1].cpu().data.numpy(),x ``` - LeNat 有6萬個參數 - AlexNet有6000萬個參數 - 加了ReLU避免梯度消失 - 加了DropOut - 加了局部歸一化層(batchNorm2d)，後來發現用途不大 ## 6.5.3 VCG - 證明增加網路的深度可以一定程度的提高網路的性能 - 所有卷積層的kernal size都相同 - VCG-16: 卷積層+全連接層共有16層 ~~層數太多，沒打code出來~~ ## 6.5.4 殘差網路(Residual Network) - 透過跳線的技巧(短路)，打破逐層傳遞的結構，避免梯度爆炸、梯度消失的問題 - 反向求導可以直接跳線傳到頂層，從而避免梯度消失或梯度爆炸 - 可藉由跳線輕鬆地訓練1000層以上的神經網路 - 一堆殘差塊而非卷基層構成的網路 - 殘差塊(其中一種): ![](https://hackmd.io/_uploads/r1dnhr9fa.png) ```python= class Resblock(nn.Module): #inheritance from nn.Module def __init__(self): #constructor super(CNN,self).__init__() self.conv1=nn.Conv2d() self.act1=nn.ReLU() self.conv2=nn.Conv2d() self.act2=nn.ReLU() def forward(self,x): #how data pass from different layer c1=self.conv1(x) a1=self.act1(c1) c2=self.conv2(a1) a2=self.act2(c1+x) ##重點在此 return a2 ``` ## 6.5.5 Inception 網路 - google提出 - 因較大的kernal適合捕捉大範圍的特徵，小kernal捕捉局部特徵-->如何決定 - 讓神經網路自己選擇合適的convolution kernal、pooling kernal ![](https://hackmd.io/_uploads/r15ATrcf6.png) --- ![](https://hackmd.io/_uploads/rklXRScz6.png) - 參數數量龐大 - 解法: 在kernal size>1的卷基層前加入kernal size=1的卷積層(輸出通道較少)，減少輸入的特徵圖數量 ## 6.5.6 NiN(Network in Network) - 卷基層進行的是一般的線性卷積運算(加權和)，再透過非線性的啟動函數輸出 - 用小的網路代替卷積層，增加非線性的能力 ![](https://hackmd.io/_uploads/H1pxx8cM6.png) ![](https://hackmd.io/_uploads/rJd7l8cf6.png) - 用全域平均池化代替全連接層-->每個特徵圖只有1個輸出值-->可以避免過擬合、過多參數的問題 - 全域池化層的輸入特徵圖數量==分類的數量問題: 1.為什麼殘差網路（Residual Network）能夠有效解決梯度消失和梯度爆炸的問題？ 2.AlexNet對卷積神經網路的發展產生了什麼重要影響？ 3.簡述Inception 網路是如何處理不同尺度的特徵，以增強神經網路的性能？