NAS - HackMD

# NAS # [暑假進度](/xmRMAR8eQAKAY9oponC1zA) :::success ==__資本主義的力量__==(無誤) ::: [toc] # Prior knowledge * What is meta learning? * 訓練機器去自己找出一個network，以往都是人工設計出一個network，然後讓機器進去train。 * Squeeze and excitation:把對圖片影響較大的channel作高權重，影響較小的channel就把權重調低，以達到提高準確率的效果 ![](https://i.imgur.com/medywSx.jpg) * Pareto optimal solution(POS):假如一支手機要提高它的品質勢必提高它的價格，而我們要同時找出最好品質和最優價格是不可能的，因為這兩個目標會互相衝突，POS就是在追求multi-objective optimization時，如何找出一個feasible solution，而solution通常有無限多種。 * Weighted product model:決定最好CP值![](https://i.imgur.com/ESmuPtT.jpg) * PPO:RL stuff，policy update * RMSprop:一種gredient decent，常用在update RNN task * inverted bottle neck:MobileNetv2這篇paper提出的東西，主要是提供不減少Acc的前提提升efficency * depth-wise seperable convolution:他是一種標準CNN的變體，主要是改變kernal和featuremap的互動關係來達到減少計算量的方法(<font color=red>重要</font>) * FLOPS:每秒浮點運算次數 :::warning * 趣聞:[Drop out申請專利了](https://mp.weixin.qq.com/s/wEswcXv5rY1AKFA2bj1MTw?fbclid=IwAR14AK7GgipHCvcXiP07qwr7EmsdPY8-fL3qH8m_axtD36k2xNZNG34gGPg) * [AI看政治](https://www.thenewslens.com/article/118672?fbclid=IwAR0e32b4z6wLT46GQx38lKUjukmUQBT0gq6y8ApAi6bqltGmL3P97sGizks) * ctrl+f "we"真D好用 * [ubuntu有夠累](/kRorBTjATuSY1M-mjt5ExQ) ![](https://i.imgur.com/Hq7y41J.png) ::: ## Learn to learn 我報的這兩篇基本上都是要訓練機器去自己找出一個network，以往都是人工找出network，而google的這兩篇是裡面極端的代表，他們用brute-force的方式幾乎找出所有組合(在他們定義的架構下的組合)，而這非常燒錢，小老百姓玩不起 5555555555555 以後應該不會再找google的來報了，對研究比較沒幫助。 # MnasNet ### 作者: ![](https://i.imgur.com/iQWBjeB.png) * 一作前期硬體:branch prediction，FPGA，後期meta ![](https://i.imgur.com/lEe2Mje.png) * 通訊作者(應該) ![](https://i.imgur.com/5yQwCXg.png) ### Abstract It is very ==difficult to manually== balance these trade-offs when there are so many architectural possibilities to consider. In this paper, we propose an ==automated mobile neural architecture search (MNAS)== approach, which explicitly incorporate model latency into the main objective so that the search can identify a model that achieves a good trade-off between accuracy and latency. To further strike the right balance between flexibility and search space size, we propose a ==novel factorized hierarchical search space== that encourages layer diversity throughout the network ### Overview archi. ![](https://i.imgur.com/v5sQscI.png) * ==這是RL training== , and use PPO to update * They choose reinforcement learning because ==it is convenient and the reward is easy to customize== The search framework consists of three components: a recurrent neural network ==(RNN)== based controller, a trainer to obtain the model accuracy, and a mobile phone based inference engine for measuring the atency. --- ### Object func. ![](https://i.imgur.com/1cVCIAo.png) 其中R(m)長下圖這樣:Weighted product model ![](https://i.imgur.com/jDZy3fU.png) --- ### Soft contraint and Strong constraint ![](https://i.imgur.com/4S1sld4.png) --- ### Novel factorized hierarchical search space ![](https://i.imgur.com/y977VD4.png) ### Result ![](https://i.imgur.com/uV7a6XS.png) --- ![](https://i.imgur.com/Gi4OKrX.png) Their approach also allows searching for ==a new architecture for any latency target==. For example, some video applications may require latency as low as 25ms. We can either scale down a baseline model, or search for new models specifically targeted to this latency constraint. --- ![](https://i.imgur.com/LIMKgCn.png) --- ![](https://i.imgur.com/qEPClDi.png) --- ![](https://i.imgur.com/JGsnLQY.png) --- ![](https://i.imgur.com/qSpf2KE.png) When α = 0, β = −1, the latency is treated as a hard constraint, so the controller tends to focus more on faster models ==to avoid the latency penalty==. On the other hand, by setting α = β = −0.07, the controller treats the target latency as a soft constraint and tries to search for models ==across a wider latency range== ### Conclusion This paper study the impact of latency constraint and searach space, ande discuss MnasNet archi. details and the importance of layer diversity --- --- --- # EfficientNet ![](https://i.imgur.com/3KX6xdh.png) ### 作者: * 同上 ### Abstract * In previous work, it is common to scale only one of the three dimensions – depth, width, and image size. Though it is possible to scale two or three dimensions arbitrarily, arbitrary scaling requires tedious ==manual tuning== and still often yields ==sub-optimal== accuracy and efficiency * Our empirical study shows that it is ==critical to balance all dimensions== of network ==width/depth/resolution==, and surprisingly such balance can be achieved ==by simply scaling each of them with constant ratio== * ![](https://i.imgur.com/jvOFeby.png) * ![](https://i.imgur.com/svZkEwC.png) ### Model scaling ![](https://i.imgur.com/YBUBOb5.png) --- ![](https://i.imgur.com/NkvJbYk.png) --- * This graph shows that – In order to pursue better accuracy and efficiency, it is critical to balance all dimensions of network width, depth, and resolution during ConvNet scaling. ![](https://i.imgur.com/EHu38Is.png) ### Mechanism * Baseline network ![](https://i.imgur.com/p9qqPBm.png) * #### Objective function * By fixing __Fi__, model scaling ==simplifies the design problem== for new resource constraints, but it ==still remains a large design space== to explore different __Li , Ci , Hi , Wi__ for each layer ![](https://i.imgur.com/ZBRDQ0A.png) ![](https://i.imgur.com/eqFtRGh.png) In this paper, they propose a new ==compound scaling method==, which ==use a compound coefficient φ to uniformly scales network width, depth, and resolution== in a principled way: ![](https://i.imgur.com/yBUlxms.png) ![](https://i.imgur.com/7VV6KKY.png) * ==φ== is a user-specified coefficient that controls ==how many more resources are available for model scaling==, while α, β, γ specify ==how to assign these extra resources== to network width, depth, and resolution respectively --- Starting from the baseline EfficientNet-B0, they apply their compound scaling method to scale it up with two steps: ![](https://i.imgur.com/v5wuPpo.png) ### Result ![](https://i.imgur.com/Vi7aLt4.png) --- ![](https://i.imgur.com/i0g1g35.png) # [Meeting minutes](/vempbwqEQUaKKHDxM-DHaw)