---
tags: epita, lrde
---
# LRDE DSD (Pytorch)
# Draft 1
- [x] Training pipeline (MNIST)
- [x] Better display epoch
- [x] Establish NN (Lenet)
- [x] Plot training loss
- [x] Add `plot_wb()`
# Draft 2
- [x] Add gradient masking
- [x] workout small example to debug
- [x] Don't take first convolution !
- [x] `plot_wb` with save option
- [ ] Try to adapt to `train_dsd`
- [x] Try learning rate scheduler
- [ ] ModelCheckpoint
- [ ] CSVLogger
# Draft 3
- [x] Clean implementation of DSD
- [x] Seperate into files
# Draft 4
- [x] Try to reproduce Adam result with NN
- [x] Try to reproduce SGD result with NN
- Try to reproduce Adam-dsd result with NN
- Try to reproduce SGD-dsd result with NN
---
- NN + no dataug + batch_size=32
- Good accuracy + overfit quickly
- NN + no dataug + batch_size=128
- Okay accuracy + overfit quickly
- NN + dataug + batch_size=32
- Good accuracy + no overfit (can train on more epoch)
- VGG13 + no dataug + batch_size=32
- doesn't train (30%)
TODO:
- [x] Launch training on MNIST without learning scheduler with NN. Does it work ?
- Yes
- [x] reproduct same learning rate scheduler as Tensorflow
- Expected

- Result

- [x] launch NN training on MNIST with learning rate scheduler on NN.
- [x] Try LR schduler + NN + SGD + FER+ dataset
- `val_acc = 0.762`
- [x] Implement DSD
- [x] Run NN + adam DSD
- [x] Run NN + sgd DSD.
- Commit 4.yaml first.
- [ ] VGG13 + sgd DSD + MNIST
- Check if weight distibution is good.
- [x] class weight + train VGG13
- [Class weight pytorch](https://discuss.pytorch.org/t/passing-the-weights-to-crossentropyloss-correctly/14731)
- ==It is not working as expected. Model is already overfitting at 1st epoch.==
- [x] Try with mobilenet
- [x] Change head of first convolution to 1 channel instead of 3. [link](https://discuss.pytorch.org/t/modify-resnet-or-vgg-for-single-channel-grayscale/22762/10)
- [x] Adapt config file to choose dataset
- [x] Adapt config file to create model from config file.
- [x] Make it train on MNIST + adam
- [x] Make it train on FER+ + adam
- [x] Make it train on FER+ + DSD + sgd
- [x] If mobilenet is working, plan all runs and meanwhile code a more classical VGG.
- [x] Try VGG13 + adam
- [Model is overfitting at early stage. Maybe need a lr warm-up ?](https://stackoverflow.com/a/55942518)
- [x] Try LR scheduler with VGG13 + SGD
- If don't work, use VGG16
- [x] Mlflow test_accuracy log.
- [ ] Try VGG16 + adam
- Doesn't work at all
- [ ] Try LR scheduler + VGG16 + sgd
- Do work but overfit around 16 epoch.
- [ ] Compress VGG16 + MobileNetv2
---
- experiments:
- [x] 1: MobilenetV2 Adam
- [ ] Overfit -> Check Add Learning Rate scheduler.
- [x] 2: MobilenetV2 Adam-dsd
- [x] 3: MobilenetV2 Sgd
- [x] 4: MobilenetV2 Sgd-dsd
- [x] 5: VGG16: SGD
- [x] 6: VGG16: SGD-dsd
---
- ` pipreqs /project/path` -> Generate requirements.txt based on import.
MobilenetV2 dsd:
- 8,9 Mo (8 903 688 octets)
MobilenetV2:
- 8,9 Mo (8 903 268 octets)
---
# DSD Experiments(blog post)
- dataset FER+2013
## 1) Naive
- 4 runs:
- sgd
- sgd-dsd
- adam
- adam-dsd
previous ccl: same perf with or w/o DSD
## 2) Going further
- **Hypothesis:** When deploying/packaging, is it better to keep DSD over baseline ? (since it has more weights to 0 -> lighter / less data to transfer on the network)
- Compare without quantization baseline/DSD (high priority)
- Compare with quantization baseline/DSD (low priority)
> for each case, report quality (val loss | F1-score) and size (MB) indicators
---
# TALK
- [x] Recap pipeline
- [x] previously done
- [x] current hypothesis
- [x] Paper recap
- Goal
- Pros/Cons
- [ ] Results
- [x] Enumerate settings of training
- [x] Explain that we didn't succeed to make VGG16 converges with Adam.
- [x] Compare val_[loss/acc] of VGG16/MobileNetV2 [sgd/sgd-dsd] + [adam/adam-dsd]
- [x] CCL: no gain in accuracy
- [x] Compare mobilenetv2, matrix 2x2 with sgd/sgd-dsd/adam/adam-dsd of val_acc
- [x] CCL: Better to use Adam.
- [ ] Compare train/val loss/acc of MobilenetV2 [sgd/sgd-dsd]
- [ ] CCL: Form of regularization
- [ ] After quantization, there is a gain in file zip (13%)
- [ ] Mobilenet Normal/DSD -> zip -> compare file size
- [ ] Mobilenet Normal/DSD -> quantization -> zip -> compare file size
- [ ] CCL: With quantization, DSD offers a gain in file size storage.
- [ ] Conclusion
- [ ] Further work
- [ ] Revenir en arriere dans la pipeline
- [ ] Tres gros dataset !
---
# Recover `mlruns/` folder
- Go to `19-03-2021/`
- Depending on which framework you want, run:
- `virtualenv lrde-env-[pytorch|tf2] && source lrde-env-[pytorch|tf2]/bin/activate && pip install -r requirements-[pytorch|tf2].txt`
- If docker container `container-lrde-19-03-2021` already exists:
- `sudo docker ps -a` and copy `CONTAINER_ID`
- `sudo docker start CONTAINER_ID`
- Else:
- Create container with `mlruns/` folder
- `sudo docker pull 3outeille/lrde-2021:19-03-2021`
- `sudo docker run -d --name container-lrde-19-03-2021 3outeille/lrde-2021:19-03-2021 tail -f /dev/null`
- `sudo docker cp container-lrde-19-03-2021:/experiments/ .`
- Run `./recover_mlruns.sh [pytorch|tf2]`
- Stop docker container
- `sudo docker ps -a` and copy `CONTAINER_ID`
- `sudo docker stop CONTAINER_ID`
- You can now use mlflow on your browser.
- `cd src/[pytorch|tf2] && mlflow ui`
- Download pytorch-mlruns to 19_3
-
- Just clean all path to make it work in local from 19_03_2021
and build an image: `/home/sphird/Document/19_03_2021/src/[tf2]`