--- title: 人工智慧期末 --- # 問答題 ## What is softmax used for? ## What are three supervised cnn model usage? ## What are the three major issues you need to learn when you study a neural network model? * Network Architecture * Activation function * Learning rule ## Basic CNN architecture can be divided into two stages, what are these stages? What are the functions of the corresponding two stages? * Convolutional layers+ pooling layer * Feature extraction(特徵擷取) * Fully connected layer * Mapping of feature maps to target labels(分類) # 計算題 ## TLU ![](https://i.imgur.com/aGOetOz.png) ### 原版 * Activation = $a=w_1x_1+w_2x_2+w_3x_3....$ * Y=1 if $a \geq \theta$, else y=0 * Update weight = $w' = w+a(t-y)V$ * $a$ is the learning rate, V is the input vector for example $x_1 = 1,x_2=0$ #### TLU example * 2 input or gate(if one of the input is true ) * Initial Weight $w_1=1,w_2=2, \theta=2$ * Time = 1 * $0 \times1+0\times2=0$ * $A\lt \theta \rightarrow y=0$ * $y=0 \rightarrow$ * $t-y=0$ no change * Time = 2 * no change * Time =3 * $1 \times1+0\times2=1$ * $A \lt \theta \rightarrow y=0$ * $w_1' = 1 +0.5(1-0)\times1=1.5$ * $w_2' = 2 +0.5(1-0)\times0=2$ * ... Continue until weight fits all condition #### Answer * No, because Activation condition is reversed * change learning rule to $w' = w-a(t-y)x$ ## Perceptron ![](https://i.imgur.com/ws2hmYM.png) ### a * 2 layer, 2 input 2 output percetron ### b * No , not linearly seperable by one line ### c * $\Sigma s_1 = 0.2*1 + (-0.5)*1 = -0.3$ * $\Sigma s_2 = (-0.7)*1 + 0.5*1 = -0.2$ * $w_{1,1}'= w_{1,1} +0.5(0-0)*1 = 0.2$ * $w_{1,2}'= w_{1,2} +0.5(0-0)*1 = -0.5$ * $w_{2,1}'= w_{2,1} +0.5(1-0)*1 =-0.2$ * $w_{2,2}'= w_{2,2} +0.5(1-0)*1 =1$ #### * ![](https://i.imgur.com/ZZdsgMh.png) * Neural Networks and Machine Learning, Simon Haykin, 3nd ed., Pearson, 2009 p85(pdf) * $y(n)為經過activation的輸出$ ## Back propogation ![](https://i.imgur.com/d2E9dgl.png) ### a * Input layer 2, hidden 2, output layer 2 ### b ![](https://i.imgur.com/oL9Hor1.png) * Note * $w$ is the weight * $a$ is the result of activation on the node $a = f(\Sigma(w_{ji}a_i))$ * $\eta$ is the learning rate * $\delta_j =(d_j-a_j)*f'(s_j) =(d_j-a_j)a_j(1-a_j)$ * $\Delta w_{ji} = \eta \delta_ja_i$ | $w_{31}$ | 0.1 | $w_{32}$ | 0.2 | | -------- | --- | -------- |:---:| | $w_{41}$ | 0.3 | $w_{42}$ | 0.4 | | $w_{53}$ | 0.5 | $w_{54}$ | 0.6 | | $w_{63}$ | 0.7 | $w_{64}$ | 0.8 | * input = $(1,1)$ #### Forward * $a_3 = f(1*0.1+1*0.2) = 0.57$ * $a_4 = f(1*0.3+1*0.4) = 0.67$ * $a_5 = f(0.57*0.5+0.67*0.6) = 0.67$ * $a_6 = f(0.57*0.7+0.67*0.8) = 0.72$ #### $\delta$s * target (0,1) * $\delta_6 = (d_6-a_6)a_6(1-a_6) = (1-0.72)0.72(1-0.72) = 0.0564$ * $\delta_5 = (0-0.67)0.67(1-0.67) = -0.1481$ * $\delta_4 = (\delta_5w_{54}+\delta_6w_{64})*f'(s_4) =(-0.1481*0.6+0.0564*0.8)0.67(1-0.67) =-0.0097$ * $\delta_3 = (\delta_5w_{53}+\delta_6w_{63})*f'(s_3) =(-0.1481*0.5+0.0564*0.7)0.57(1-0.57) =-0.0085$ #### $\Delta w$ $\Delta W_{64} = \eta \delta_6a_4 = 0.5*0.0564*0.67=0.01889$ $\Delta W_{63} = 0.5*0.0564*0.57 = 0.0161$ $\Delta W_{54} = 0.5*-0.1481*0.67 = -0.0496$ $\Delta W_{53} = 0.5*-0.1481*0.57 = -0.0422$ $\Delta W_{42} = 0.5*-0.0097*1 = -0.0049$ $\Delta W_{41} = 0.5*-0.0097*1 = -0.0049$ $\Delta W_{32} = 0.5*-0.0085*1 = -0.0043$ $\Delta W_{31} = 0.5*-0.0085*1 = -0.0043$ #### update weight$w' = w+\Delta w$ | $w'_{31}$ | 0.0957 | $w'_{32}$ | 0.1957 | | --------- | ------ | --------- |:------:| | $w'_{41}$ | 0.2951 | $w'_{42}$ | 0.3951 | | $w'_{53}$ | 0.4578 | $w'_{54}$ | 0.5504 | | $w'_{63}$ | 0.7161 | $w'_{64}$ | 0.8189 | ### 驗證 tensorflow https://colab.research.google.com/drive/1uWPPby020fEdusBBwIwjyLRyArkqoEMv?usp=sharing ## CNN ![](https://i.imgur.com/33nJT5m.png) #### Architecture ![](https://i.imgur.com/RrRLjLd.png) ### A forward * Tensorflow check https://github.com/JINSCOTT/MLHomework/blob/0cd3071fed50c104864d2fe2b8d8b4fbad5122ed/cnn_maxpooling_backpropagate.ipynb #### CNN ![](https://i.imgur.com/v2mP8ix.png) ![](https://i.imgur.com/JTxBvoZ.png) #### Pooling ![](https://i.imgur.com/Wtk6M54.png) #### Linear * $a_5 = f(0.1*80+ (-0.05)*90+0.05*20+(-0.02)*60) = 0.9644$ * $a_6 = f(0.05*80+(-0.02)*90 + 0.03*20+(-0.07)*60)=0.1978$ * $a_7 = f(-0.4*0.9644+-1*0.1978) = 0.3581$ * $a_8 = f(0.5*0.9644+-0.5*0.1978) = 0.5947$ #### Softmax ![](https://i.imgur.com/LiAYrck.png) * $s1=0.4411$ * $s2=0.5589$ ### Backward * 因為有softmax層 outputlayer算法為: * $\delta_j =(t_j-y_j)(y_j-y_j^2)a_j(1-a_j)$ * $t_j$為target,$y_j$為softmax的結果 * 其他不變 #### sigmoid $\delta$s * target (1,0) * $\delta_8 = (0-0.5589)(0.5589-0.5589^2)0.5947(1-0.5947) = -0.0332$ * $\delta_7 = (1-0.4411)(0.4411-0.4411^2)0.3581(1-0.3581) = 0.0317$ * $\delta_6 = (\delta_8w_{86}+\delta_7w_{76})*f'(a_6) =(-0.0332*-0.5+0.0317*-1)0.1978(1-0.1978) =-0.0024$ * $\delta_5 = (\delta_8w_{85}+\delta_7w_{75})*f'(a_5) =(-0.0332*0.5+0.0317*-0.4)0.9644(1-0.9644) =-0.001$ * $\delta_4 = (\delta_6w_{64}+\delta_5w_{54})*f'(a_4) =(-0.0024*-0.07-0.001*-0.02)*1 =0.000188$ * $\delta_3 = (\delta_6w_{63}+\delta_5w_{53})*f'(a_3) =(-0.0024*0.03-0.001*0.05)*1 =-0.000122$ * $\delta_2 = (\delta_6w_{62}+\delta_5w_{52})*f'(a_2) =(-0.0024*-0.02-0.001*-0.05)*1 =0.0001$ * $\delta_1 = (\delta_6w_{61}+\delta_5w_{51})*f'(a_1) =(-0.0024*0.05-0.001*0.1)*1 =-0.00022$ #### sigmoid $w+\Delta w$ * learning rate = 0.5 * $w_{8,6} = -0.5 +0.5*-0.0332*0.1978 = -0.5033$ * $w_{8,5} = 0.5 +0.5*-0.0332*0.9644 = 0.4839$ * $w_{7,6} = -1 +0.5*0.0317*0.1978 = -0.9968$ * $w_{7,5} = -0.4 +0.5*0.0317*0.9644 = 0.3847$ * $w_{6,4} = -0.07 +0.5*-0.0024*60 = -0.142$ * $w_{6,3} = 0.03 +0.5*-0.0024*20 = 0.006$ * $w_{6,2} = -0.02 +0.5*-0.0024*90 = -0.128$ * $w_{6,1} = 0.05 +0.5*-0.0024*80 = -0.046$ * $w_{5,4} = -0.02 +0.5*-0.001*60 = -0.05$ * $w_{5,3} = 0.05 +0.5*-0.001*20 = 0.04$ * $w_{5,2} = -0.05 +0.5*-0.001*90 = -0.095$ * $w_{5,1} = 0.1 +0.5*-0.001*80 = 0.06$ #### upsampling * reverse maxpool ![](https://i.imgur.com/WvlexvF.png) * reverse relu ![](https://i.imgur.com/sF3hNme.png) ![](https://i.imgur.com/FAmkRSA.png) ![](https://i.imgur.com/bTw8AA5.png) ## IOU Calculate # chapter 6 Object detection ![](https://i.imgur.com/LJCbGb3.png) ## Difference between one stage and two stage ![](https://i.imgur.com/ERYnhFT.png) ## R-CNN ![](https://i.imgur.com/tNq1kYe.png) * 重複製作對象feature map造成速度較慢 * Hard to optimize * Have to be trained seperately ## yolo family * single shot detector ### Yolov1 * backbone: based on GoogLeNet * Unified detection: * ![](https://i.imgur.com/TC7TCtA.png) * Non-Maximum Suppression (NMS) 用來選去包圍物件最佳的 Bounding box.獲得最佳的 Intersection over Union (IoU) * ![](https://i.imgur.com/oKmv12s.png) * 優點: 快、 訓練較簡單 * 缺點:  * 一個Grid只能有一(或2)個Class,因此對於擁擠及較小的物間偵測能力較差 * bounding box 對於物件的aspect ratio 較為固定 ### Yolov2 * 更換 backbone 為 Darknet 19 * Remove the fully connected layers with average pooling * More BBOX(5) in each grid cell(better at small and occlusion) * multi - scale training on each batch ### Yolov3 * Darknet 53 * Residual learning(input is combined with block output) * ![](https://i.imgur.com/KT3fHPK.png) ### Yolov4 * CSPDarknet53 * Neck: SPP(splatial pyramid) and PANet(, Path Aggregation Network) * Head: YOLO layer * Bounding box regression loss * CIoU, GIoU, DIoU, MSE 四種 * IOU Loss * CIoU (Complete-IoU) Loss * Regularization * Data augmentation * Cut mix: 在圖中放另一張圖 * Mosaic data augmentation: 將四張圖組合唯一 ### Yolov5 * More mosaic data augment * GIoU (Generalized-IoU) Loss * PANet only * Implement in pytorch ### YOLOX * Anchor free * Faster training/inference speed * Do not need to determine anchor parameters ![](https://i.imgur.com/YNLtPyF.png) * Decoupled head * ![](https://i.imgur.com/9nyC9wZ.png) ### Else * Yolo F/R/S/P # Chapter 7 Instance segmentation ## ### R-CNN ![](https://i.imgur.com/8DkioFL.png) ### Fast R-CNN ![](https://i.imgur.com/D1ZNvGP.png) * Still Use Selective search ### Faster R-CNN * Speed up with region proposal network ![](https://i.imgur.com/FDsoLJl.png) ### Mask R-CNN * Extends Faster R-CNN with segmentation ![](https://i.imgur.com/ZrzwXhS.png) ## SOLO family ![](https://i.imgur.com/uHtPEqt.png) ### SOLO ![](https://i.imgur.com/gYbInPh.png) ### SOLOV2 #### difference fromv1 * The object mask generation is decoupled into a mask kernel prediction and mask feature learning, which are responsible for generating convolution kernels and the feature maps to be convolved with, respectively. * Predict high-resolution object masks * SOLOv2 significantly reduces inference overhead with matrix non-maximum suppression (NMS) technique. ![](https://i.imgur.com/2elsXVi.png) * Dynamic Convolutions * ![](https://i.imgur.com/YaK406d.png) * More flexible * It adds 2D offsets to the regular grid sampling locations in the standard convolution. It enables free form deformation of the sampling grid # outdated (no un-supervised) ## List three most popular types of generative models. * Variational Autoencoders * Pixel RNN/CNN * Generative adversarial network(GAN) ## What is the main difference between Supervised Learning and Un-supervised Learning networks? * Supervised learning * Data: Data and label * Goal: Map input data to label * Un-supervised learning * Data: Data, no labels * Goal: Learn some underlying structure of the data ## Competitive network ![](https://i.imgur.com/JvNTmGa.png) ## SOFM ![](https://i.imgur.com/7LH5Nwv.png) ## Cross entropy and Gradient Descent Method ![](https://i.imgur.com/bOCX2yO.png) ###### tags: `Artificial Neural Networks and Deep Learning ` `CSnote` `Artificial Neural Networks and Deep Learning` <style> .navbar-brand:before { content: ' NTPU × '; padding-left: 1.7em; background-image: url(https://encrypted-tbn0.gstatic.com/images?q=tbn:ANd9GcTIby6KbceSTNEzAnqE8sMzZMXgAAPJsSDhdu4d16f03Q); background-repeat: no-repeat; background-size: contain; } .navbar-brand > .fa-file-text { padding-left: 0.1em; display: none; } </style>