# NAS
# [暑假進度](/xmRMAR8eQAKAY9oponC1zA)
:::success
==__資本主義的力量__==(無誤)
:::
[toc]
# Prior knowledge
* What is meta learning?
* 訓練機器去自己找出一個network,以往都是人工設計出一個network,然後讓機器進去train。
* Squeeze and excitation:把對圖片影響較大的channel作高權重,影響較小的channel就把權重調低,以達到提高準確率的效果

* Pareto optimal solution(POS):假如一支手機要提高它的品質勢必提高它的價格,而我們要同時找出最好品質和最優價格是不可能的,因為這兩個目標會互相衝突,POS就是在追求multi-objective optimization時,如何找出一個feasible solution,而solution通常有無限多種。
* Weighted product model:決定最好CP值
* PPO:RL stuff,policy update
* RMSprop:一種gredient decent,常用在update RNN task
* inverted bottle neck:MobileNetv2這篇paper提出的東西,主要是提供不減少Acc的前提提升efficency
* depth-wise seperable convolution:他是一種標準CNN的變體,主要是改變kernal和featuremap的互動關係來達到減少計算量的方法(<font color=red>重要</font>)
* FLOPS:每秒浮點運算次數
:::warning
* 趣聞:[Drop out申請專利了](https://mp.weixin.qq.com/s/wEswcXv5rY1AKFA2bj1MTw?fbclid=IwAR14AK7GgipHCvcXiP07qwr7EmsdPY8-fL3qH8m_axtD36k2xNZNG34gGPg)
* [AI看政治](https://www.thenewslens.com/article/118672?fbclid=IwAR0e32b4z6wLT46GQx38lKUjukmUQBT0gq6y8ApAi6bqltGmL3P97sGizks)
* ctrl+f "we"真D好用
* [ubuntu有夠累](/kRorBTjATuSY1M-mjt5ExQ)

:::
## Learn to learn
我報的這兩篇基本上都是要訓練機器去自己找出一個network,以往都是人工找出network,而google的這兩篇是裡面極端的代表,他們用brute-force的方式幾乎找出所有組合(在他們定義的架構下的組合),而這非常燒錢,小老百姓玩不起
5555555555555
以後應該不會再找google的來報了,對研究比較沒幫助。
# MnasNet
### 作者:

* 一作
前期硬體:branch prediction,FPGA,後期meta

* 通訊作者(應該)

### Abstract
It is very ==difficult to manually== balance these trade-offs when there are so many architectural possibilities to consider. In this paper, we propose an ==automated mobile neural architecture search (MNAS)== approach, which explicitly incorporate model latency into the main objective so that the search can identify a model that achieves a good trade-off between accuracy and latency.
To further strike the right balance between flexibility and search space size, we propose a ==novel factorized hierarchical search space== that encourages layer diversity throughout the network
### Overview archi.

* ==這是RL training== , and use PPO to update
* They choose reinforcement learning because ==it is convenient and the reward is easy to customize==
The search framework consists of three components: a recurrent neural network ==(RNN)== based controller, a trainer to obtain the model accuracy, and a mobile phone based inference engine for measuring the atency.
---
### Object func.

其中R(m)長下圖這樣:Weighted product model

---
### Soft contraint and Strong constraint

---
### Novel factorized hierarchical search space

### Result

---

Their approach also allows searching for ==a new architecture for any latency target==. For example, some video applications may require latency as low as 25ms. We can either scale down a baseline model, or search for new models specifically targeted to this latency constraint.
---

---

---

---

When α = 0, β = −1, the latency is treated as a hard constraint, so the controller tends to focus more on faster models ==to avoid the latency penalty==. On the other hand, by setting α = β = −0.07, the controller treats the target latency as a soft constraint and tries to search for models ==across a wider latency range==
### Conclusion
This paper study the impact of latency constraint and searach space, ande discuss MnasNet archi. details and the importance of layer diversity
---
---
---
# EfficientNet

### 作者:
* 同上
### Abstract
* In previous work, it is common to scale only one of the three dimensions – depth, width, and image size. Though it is possible to scale two or three dimensions arbitrarily, arbitrary scaling requires tedious ==manual tuning== and still often yields ==sub-optimal== accuracy and efficiency
* Our empirical study shows that it is ==critical to balance all dimensions== of network ==width/depth/resolution==, and surprisingly such balance can be achieved ==by simply scaling each of them with constant ratio==
* 
* 
### Model scaling

---

---
* This graph shows that – In order to pursue better accuracy and efficiency, it is critical to balance all dimensions of network width, depth, and resolution during ConvNet scaling.

### Mechanism
* Baseline network

* #### Objective function
* By fixing __Fi__, model scaling ==simplifies the design problem== for new resource constraints, but it ==still remains a large design space== to explore different __Li , Ci , Hi , Wi__ for each layer


In this paper, they propose a new ==compound scaling method==, which ==use a compound coefficient φ to uniformly scales network width, depth, and resolution== in a principled way:


* ==φ== is a user-specified coefficient that controls ==how many more resources are available for model scaling==, while α, β, γ specify ==how to assign these extra resources== to network width, depth, and resolution respectively
---
Starting from the baseline EfficientNet-B0, they apply their compound scaling method to scale it up with two steps:

### Result

---

# [Meeting minutes](/vempbwqEQUaKKHDxM-DHaw)