## 各論文實驗統整
Logistic Regression on MNIST
### 要有的
* 1-Layer Perceptron on MNIST
* ResNet(18 or 34), DenseNet121 on Cifar10, Tiny-ImageNet
* LSTM on Penn Treebank
* 待選
* WGAN-GP on ?
* ? on IWSLT’14
### AdaBelief (2020)
* VGG11, ResNet34, DenseNet121 on Cifar10
* 1,2,3-layer LSTM on Penn Treebank
* WGAN WGAN-GP with vanilla CNN generator on Cifar10
* SN-GAN with ResNet generator on Cifar10
* IWSLT14(Transformer)、PASCAL VOC object detection
* ImageNet(Not Win)
### AdaBound (2019 ICLR)
* 1-Layer Perceptron on MNIST
* ResNet34 on Cifar10
* DenseNet121 on Cifar10
* 1,2,3-layer LSTM on Penn Treebank
### NosAdam (2020)
* 1-Layer Perceptron on MNIST
* Wide ResNet28 on Cifar10
### PAdam (2019)
* VGG16, ResNet18, Wide ResNet16 on Cifar10, Cifar100
### AdaShift (2018 ICLR)
* Multilayer Perceptron on MNIST
* ResNet18, DenseNet100 on Cifar10, Tiny-ImageNet
* WGAN-GP on ? (fixed generator)
* LSTM on Neural Machine Translation
### AdamW (2017 ICLR)
* 2x(96,64)d ResNet26 on Cifar10, ImageNet32x32
### SWATS (2017)
* ResNet32, DenseNet, PyramidNet, SENet on Cifar10, Cifar100
* LSTM, QRNN on Penn Treebank, WT-2
* ResNet18 on Tiny-ImageNet
### RAdam (2019 ICLR)
* ? on One Billion Word
* ? on Cifar10, ImageNet
* ? on IWSLT’14 DE-EN/EN-DE, WMT’16 EN-DE
### Neural Optimizer Search (2017)
* Wide-ResNet on Cifar10
* Google Neural Machine Translation (GNMT) on WMT’14 EN-GE
* LSTM on Penn Treebank
### AMSGrad (2018 ICLR)
* FFN on MNIST
* CIFARNET on Cifar 10
### 參考價值低
* Google self-learning optimizer (2020)
* LookAhead (2019)
* Ranger (2019)
---
# 6/23 例行報告
---
## 有的沒的
[Signal 通訊軟體](https://buzzorange.com/techorange/2021/01/28/qa-signal-ceo-moxie-marlinspike-on-the-future-of-privacy/)
[Floc, Goole 最新的網路追蹤技術](https://www.bnext.com.tw/article/61662/google-cookie-floc)
[網路信標](https://zh.wikipedia.org/wiki/%E7%BD%91%E7%BB%9C%E4%BF%A1%E6%A0%87)
---
activate function 自身輸出入分布相同
---
[新的optimizer整理重要](https://zhuanlan.zhihu.com/p/208178763)
[WarmUp解釋1](https://www.zhihu.com/question/338066667)
[WarmUp解釋2](https://chih-sheng-huang821.medium.com/%E6%B7%B1%E5%BA%A6%E5%AD%B8%E7%BF%92warm-up%E7%AD%96%E7%95%A5%E5%9C%A8%E5%B9%B9%E4%BB%80%E9%BA%BC-95d2b56a557f)
[關於地形的論文](https://arxiv.org/pdf/1612.04010v1.pdf)
[關於地形的論文2](https://arxiv.org/pdf/1612.04010.pdf)
[信心學習](https://zhuanlan.zhihu.com/p/101379289):可以找出資料集內的標籤錯誤
[RAdam的解釋](https://allen108108.github.io/blog/2019/10/08/RAdam%20optimizer%20%E6%96%BC%20Dogs%20vs.%20Cats%20%E8%B2%93%E7%8B%97%E8%BE%A8%E8%AD%98%E4%B8%8A%E4%B9%8B%E5%AF%A6%E4%BD%9C/)
[LookAhead的解釋](https://allen108108.github.io/blog/2019/10/08/Lookahead%20optimizer%20%E6%96%BC%20Dogs%20vs.%20Cats%20%E8%B2%93%E7%8B%97%E8%BE%A8%E8%AD%98%E4%B8%8A%E4%B9%8B%E5%AF%A6%E4%BD%9C/)
---
# 群體期末
---
社交蜘蛛
碎形
自適應波浪
gradient base
---
# 6/16 例行報告
---
* [Neural Optimizer Search](https://arxiv.org/pdf/1709.07417.pdf)
* 星期六提到的論文整理中
---
# 6/9 例行報告
---
# 雜物
[對鞍點的理解](https://www.getit01.com/p20180103768109802/)
[Deep Learning without Poor Local Minima](https://arxiv.org/abs/1605.07110)
---
## Padam
把RMS的 ^0.5 換成 ^p,0<p<0.5,發現結果變好,且說明Ada系的sqrt(vt)並非類似二階導數。但此方法的超參數繁多,且p的調整與lr相關。
---
## SWATS
以EMA方式記錄梯度與實際走的長度的比例,並在此值跟這次的比例之差小於epsilon時將該維度切換成SGD並使用此比例做為學習率。
[解說與 150 epoch 時調低學習率對切換條件的影響的質疑](https://zhuanlan.zhihu.com/p/32406552)
---
## AdaShift
將vt公式的 gt^2 換成前幾次的 gt-n^2,讓他跟gt沒有正相關(但不就是應該要有嗎),感覺是針對AMSgrad那篇設計出來的東西,同時再次說明sqrt(vt)並非類似二階導數,而且似乎隨機也不會太差。
[作者解說影片](https://www.bilibili.com/video/av64670460/)
---
## NosAdam
一個調整Adam中beta2^t項使其避免發散的框架,實際上使用 $\sum_{k=1}^{t+1}k^{-\gamma} / \sum_{k=1}^tk^{-\gamma}$
[作者自解說](https://zhuanlan.zhihu.com/p/65625686)
Point:自製optimizer在發paper時也要做Weight Decay
---
## 已讀過的特性整理
* Padam、AdaShift: 自適應學習率並非二階導數,是可動的(Expectigrad等)
* SWATS: 用某種方法漸漸過渡到SGDM
* NosAdam: 沒什麼價值,傾向不往不收斂的方向去做(數學證明多)
* AdamW: Weight Decay 比L2正規化有用→L2正規化(防止overfitting)或許與訓練速度相衝突,並與Weight Decay是不同的概念
---
# 6/2 例行報告
---
## 尋找論文
### Adam 變形與相關
[AdaBelief (2020)](https://arxiv.org/pdf/2010.07468.pdf)
[AdaBound (2019)](https://arxiv.org/abs/1902.09843)
[NosAdam (2020)](https://arxiv.org/abs/1805.07557)
[PAdam (2019)](https://arxiv.org/abs/1901.09517)
[AdaShift (2018)](https://arxiv.org/abs/1810.00143)
[Dissecting Adam (2017)](https://arxiv.org/abs/1705.07774)
[AdamW (2017)](https://arxiv.org/abs/1711.05101)
[SWATS (2017)](https://arxiv.org/abs/1712.07628)
----
## 尋找論文
### 強者
[RAdam (2019)](https://kknews.cc/zh-tw/code/2nxbbjg.html)
[LookAhead (2019)](https://blog.csdn.net/u011681952/article/details/99414931)
[Ranger (2019)](https://zhuanlan.zhihu.com/p/100877314)
----
## 尋找論文
### 自我學習
[Google self-learning optimizer (2020)](https://www.jiqizhixin.com/articles/2020-10-21)
[Neural Optimizer Search (2017)](https://mp.weixin.qq.com/s/E0ULyXGz3UEcD0cIg-XkNA)
---
## Optimizer 整理
[懶人包1](https://zhuanlan.zhihu.com/p/208178763)
[懶人包2](https://medium.com/%E8%BB%9F%E9%AB%94%E4%B9%8B%E5%BF%83/deep-learning-%E7%82%BA%E4%BB%80%E9%BA%BCadam%E5%B8%B8%E5%B8%B8%E6%89%93%E4%B8%8D%E9%81%8Esgd-%E7%99%A5%E7%B5%90%E9%BB%9E%E8%88%87%E6%94%B9%E5%96%84%E6%96%B9%E6%A1%88-fd514176f805)
[懶人包3](https://zhuanlan.zhihu.com/p/65625686)
---
## 以後再看
可解釋 AI
---
## AdaBelief
把 g^2 項換成 (g-m)^2 使如果當前預測準確度高(loss曲率低)則學習率高,反之則低
[解釋與EAdam的質疑](https://zhuanlan.zhihu.com/p/339225508)
[解釋與Fine-tune的質疑](https://www.jiqizhixin.com/articles/2020-10-17)
[解釋與實驗baseline的質疑](https://www.qbitai.com/2020/10/19174.html)
---
## AdaBound
用一個bound慢慢夾自適應學習率讓他前期是Adam後期是SGD
後期lr=0.1,隨beta2收斂
[衍伸論文](https://arxiv.org/abs/1908.04457)
[之前也試過讓後期變SGD的論文SWATS](https://arxiv.org/abs/1712.07628)
[有人認為是原型的AdaFactor](https://arxiv.org/abs/1804.04235)
[AdaFactor的解釋](https://kexue.fm/archives/7302)
[對實驗的質疑](https://www.zhihu.com/question/313863142)
Point:好像提出Optimizer一定要跑ImageNet和NLP和GAN系列不然不會被認可?
---
## 已讀過的特性整理
* Expectigrad: Outer Momemtum, AA instead of RMS
* Gradient Centralization: 對梯度做中心化等操作
* AdaBelieve: 考慮loss靠近當前的曲率
* AdaBound: 用某種方法漸漸過渡到SGDM
* Other: 靠近0的梯度修剪
---
## 要做的事候補
讀上面的paper
查訓練過程統計梯度資訊的code [(tensorboard?)](https://www.tensorflow.org/tensorboard/)
整理通用測資並寫測試用code
---
# 4/21 例行報告
---
* 這週寫出 Expectigrad (我上次報的那篇 paper)並跑兩個實驗(按方向鍵右)
---
### paper MNIST 原圖(按方向鍵下)

----
### run1

----
### run2

----
### run3

----
### run4

----
### run5

----
### run6

----
### run7

----
### run8

----
### run9(run10壞掉了)(按方向鍵右)

---
### paper CIFAR 原圖 10 run 平均(按方向鍵下)

----
### 我的 10 run 平均

----
### 以下 run 1~10

----

----

----

----

----

----

----

----

----

---
# 3/3 例行報告
---
## 安裝環境
### TensorFlow 與 GPU 加速
- Driver Version: 460.32.03
- CUDA Version: 11.2.1
- cudNN Version: 8.1.0.77
- TensorFlow 2.4.1
[解決 cudNN 顯卡記憶體不足問題](https://davistseng.blogspot.com/2019/11/tensorflow-2.html)
----
## 安裝環境
### 筆電 VSCode
- Python
- Remote SSH [使用金鑰登入](https://blog.gtwang.org/linux/linux-ssh-public-key-authentication/)
[Python 虛擬環境的原理與使用](https://zhuanlan.zhihu.com/p/71615515)
[venv 原理](https://www.kawabangga.com/posts/3543)
[source 的解釋](https://www.itread01.com/content/1548311242.html)
---
## 開發
[**kwargs 意義](https://skylinelimit.blogspot.com/2018/04/python-args-kwargs.html)
[Keras 簡中文檔](https://keras.io/zh/)
[TF2.3 以下的 bug](https://www.mdeditor.tw/pl/pNvD/zh-tw?fbclid=IwAR163v4_k_9WH8myH2-2cna_3PBOwq0p_3fnOk-7HJeILTbFGDnFIArsg2Q)
[Python 底線](https://zhuanlan.zhihu.com/p/36173202)
---
## 跳過的東西
- pdb
- git on VSCode
- test VSCode Remote SSH server on gentoo
---
## 有的沒的
[在 Windows 10 上關閉自動調整亮度的 Intel DPST](https://blog.brucehsu.org/posts/2017/04/14/disable-intel-display-power-saving-dpst-on-windows-10/)
---
# 3/10 例行報告
---
## 有的沒的
[Python 運算子本質與強弱型別問題](https://www.itread01.com/content/1599015784.html)
[Python 的星號本質及其使用方式](https://www.itread01.com/hkcpxqy.html)
---
## Custom Optimizer
[自製 Optimizer in TF2](https://cloudxlab.com/blog/writing-custom-optimizer-in-tensorflow-and-keras/)
----
```python=
from tensorflow.python.keras.optimizer_v2 import optimizer_v2
from tensorflow.python.util.tf_export import keras_export
from tensorflow.python.ops import state_ops
import random
@keras_export("keras.optimizers.custom")
class custom_optimizer(optimizer_v2.OptimizerV2):
_HAS_AGGREGATE_GRAD = False
def __init__(self,
learning_rate=0.01,
rand_rate=0.1,
name="test",
**kwargs):
super(test_optimizer, self).__init__(name, **kwargs)
self._set_hyper("learning_rate", kwargs.get("lr", learning_rate))
self._set_hyper("decay", self._initial_decay)
self._set_hyper("rand_rate", rand_rate)
self._is_first = True
def _create_slots(self, var_list):
for var in var_list:
self.add_slot(var, "positive")
def _resource_apply_dense(self, grad, var, apply_state=None):
var_dtype = var.dtype.base_dtype
lr = self._decayed_lr(var_dtype)
rand_rate = self._get_hyper("rand_rate")
positive = self.get_slot(var, "positive")
if self._is_first:
self._is_first = False
new_var = var - grad * lr * \
random.uniform(1.0 - rand_rate,
1.0 + rand_rate)
else:
new_var = var - (grad + positive) * lr * \
random.uniform(1.0 - rand_rate,
1.0 + rand_rate)
positive.assign(grad)
return state_ops.assign(var, new_var, use_locking=self._use_locking).op
def _resource_apply_sparse(self, grad, var, indices, apply_state=None):
raise NotImplementedError
def get_config(self):
config = super(test_optimizer, self).get_config()
config.update({
"learning_rate": self._serialize_hyperparameter("learning_rate"),
"decay": self._serialize_hyperparameter("decay"),
"rand_rate": self._serialize_hyperparameter("rand_rate")
})
return config
```
---
## GLAdam

$lr_t = \mathrm{learning\_rate} * \sqrt{1 - \beta_2^t} / (1 - \beta_1^t)$
$m_t = \beta_1 * m_{t-1} + (1 - \beta_1) * g$
$v_t = \beta_2 * v_{t-1} + (1 - \beta_2) * g^2$
$\hat{v}_t = \max(\hat{v}_{t-1}, v_t)$
$\theta_t = \theta_{t-1} - lr_t * m_t / (\sqrt{\hat{v}_t} + \epsilon)$
---
# 3/17 例行報告
---
## Optimizer call stack
```
minimize
_compute_gradients (don't care)
apply_gradients
_create_all_weights
_create_hypers (don't care)
_create_slots (implement)
call add_slot
_prepare
_prepare_local (implement)
update apply_state[(var_device, var_dtype)]
_distributed_apply
apply_grad_to_update_var
_resource_apply_dense (implement)
```
remind to use operations when implement value assign
---
# 4/13 例行報告
---
## Expectigrad

----

---
## 有的沒的
[Hello Matplotlib](https://medium.com/python4u/hello-matplotlib-8ffe04355ebf)
[使用 TensorBoard 視覺化呈現 TensorFlow 計算流程教學](https://blog.gtwang.org/programming/tensorboard-tensorflow-visualization-tutorial/)
https://github.com/linsamtw/cifar10_vgg16_kaggle
---
# 4/20 例行報告
---
## kawai
* [Owl Search Algorithm](https://sci-hub.se/https://doi.org/10.3233/JIFS-169452):略複雜,總覺得不可信
* [Squirrel Search Algorithm](https://www.sciencedirect.com/science/article/pii/S2210650217305229):複雜
* [Jellyfish Search](https://www.sciencedirect.com/science/article/pii/S0096300320304914):水母有時整個群體往global best移動,有時在群內移動,分成與另一隻水母溝通往好的移動與自己隨機亂飄。很多隨機值要再看看是否有作弊。
* Transient Search Optimization
* Butterfly Optimization Algorithm
* Emperor Penguins Colony
## kakkoi
* [Future Search Algorithm](https://www.researchgate.net/publication/327654743_Future_search_algorithm_for_optimization):垃圾
* [Chaos Game Optimization](https://sci-hub.se/10.1007/s10462-020-09867-w):根據global best, mean of random seeds, self seed組成三角形,並以不同alpha值以4種方式對每個seed產生新seed,再選前N個進入下一代
* [Social Engineering Optimizer](https://www.researchgate.net/publication/321155851_Social_Engineering_Optimization_SEO_A_New_Single-Solution_Meta-heuristic_Inspired_by_Social_Engineering):直覺爛爛的
* [Giza Pyramids Construction](https://sci-hub.se/https://link.springer.com/article/10.1007/s12065-020-00451-3):拐彎抹角,先懶得看
* Archimedes Optimization Algorithm
* Black Hole Mechanics Optimization
* Life Choice-Based Optimizer
* Multi-Verse optimizer
## ub
* [Coronavirus Optimization Algorithm](https://arxiv.org/abs/2003.13633):一個框架,沒有實作關鍵的擴散,有LSTM示範
* Shuffled Shepherd Optimization Algorithm
* Golden Ratio Optimization Method
* Black Widow Optimization Algorithm
* Sailfish Optimizer
* Dynamic Differential Annealed Optimization
[Nature inspired optimization algorithms or simply variations of metaheuristics?](https://www.researchgate.net/publication/343846931_Nature_inspired_optimization_algorithms_or_simply_variations_of_metaheuristics)
關於解的形式可能不是串列 圖的串列化
{"metaMigratedAt":"2023-06-15T18:45:15.224Z","metaMigratedFrom":"YAML","title":"例行進度報告","breaks":true,"slideOptions":"{\"theme\":\"white\",\"transition\":\"slide\"}","contributors":"[{\"id\":\"16b97510-a1a9-4b89-8269-ea0fcfa23b07\",\"add\":16913,\"del\":3163}]"}