Rethinking the Inception Architecture for Computer Vision

{%hackmd @themes/dracula %} ## General Design Principles ==避免在前面的網路過度壓縮== For any cut separating the inputs from the outputs, one can access the amount of information passing though the cut. One should avoid bottlenecks with extreme compression 也就是說，把一層網路拆成兩層網路，一方面可以量測中間的資料，更重要的是可以避免過度的壓縮，input 到output應該慢慢 ==高維度更容易被處理?== ![image](https://hackmd.io/_uploads/By4xb31K6.png) 在卷機網路加進activation function會讓training速度增快 ==在低維度做Spatial aggregation== Spatial aggregation can be done over lower dimensional embeddings without much or any loss in representational power. 在低維度做捲積不會有太多訊息的損失，所以在進行捲積前可以先減少維度 ==Balance the width and depth of the network== ## Factorizing Convolutions with Large Filter Size ==Factorization into smaller convolutions== 把較大的捲積層換成幾個小的 ![image](https://hackmd.io/_uploads/BkV2w3yF6.png) Ex. 把conv5換成兩個conv3，增加深度也增加了速度 ==Spatial Factorization into Asymmetric Convolutions== 把捲積層換成不對稱的，根據實驗發現在前面的網路用效果不好，feature在m*m使用最好(m = 12~20) ![image](https://hackmd.io/_uploads/r1sOFhkYp.png) Ex. conv3換成1\*n跟n\*1