--- tags: epita, lrde --- # LRDE DSD (Pytorch) # Draft 1 - [x] Training pipeline (MNIST) - [x] Better display epoch - [x] Establish NN (Lenet) - [x] Plot training loss - [x] Add `plot_wb()` # Draft 2 - [x] Add gradient masking - [x] workout small example to debug - [x] Don't take first convolution ! - [x] `plot_wb` with save option - [ ] Try to adapt to `train_dsd` - [x] Try learning rate scheduler - [ ] ModelCheckpoint - [ ] CSVLogger # Draft 3 - [x] Clean implementation of DSD - [x] Seperate into files # Draft 4 - [x] Try to reproduce Adam result with NN - [x] Try to reproduce SGD result with NN - Try to reproduce Adam-dsd result with NN - Try to reproduce SGD-dsd result with NN --- - NN + no dataug + batch_size=32 - Good accuracy + overfit quickly - NN + no dataug + batch_size=128 - Okay accuracy + overfit quickly - NN + dataug + batch_size=32 - Good accuracy + no overfit (can train on more epoch) - VGG13 + no dataug + batch_size=32 - doesn't train (30%) TODO: - [x] Launch training on MNIST without learning scheduler with NN. Does it work ? - Yes - [x] reproduct same learning rate scheduler as Tensorflow - Expected ![](https://i.imgur.com/KWHs16k.png) - Result ![](https://i.imgur.com/DgI1E4e.png) - [x] launch NN training on MNIST with learning rate scheduler on NN. - [x] Try LR schduler + NN + SGD + FER+ dataset - `val_acc = 0.762` - [x] Implement DSD - [x] Run NN + adam DSD - [x] Run NN + sgd DSD. - Commit 4.yaml first. - [ ] VGG13 + sgd DSD + MNIST - Check if weight distibution is good. - [x] class weight + train VGG13 - [Class weight pytorch](https://discuss.pytorch.org/t/passing-the-weights-to-crossentropyloss-correctly/14731) - ==It is not working as expected. Model is already overfitting at 1st epoch.== - [x] Try with mobilenet - [x] Change head of first convolution to 1 channel instead of 3. [link](https://discuss.pytorch.org/t/modify-resnet-or-vgg-for-single-channel-grayscale/22762/10) - [x] Adapt config file to choose dataset - [x] Adapt config file to create model from config file. - [x] Make it train on MNIST + adam - [x] Make it train on FER+ + adam - [x] Make it train on FER+ + DSD + sgd - [x] If mobilenet is working, plan all runs and meanwhile code a more classical VGG. - [x] Try VGG13 + adam - [Model is overfitting at early stage. Maybe need a lr warm-up ?](https://stackoverflow.com/a/55942518) - [x] Try LR scheduler with VGG13 + SGD - If don't work, use VGG16 - [x] Mlflow test_accuracy log. - [ ] Try VGG16 + adam - Doesn't work at all - [ ] Try LR scheduler + VGG16 + sgd - Do work but overfit around 16 epoch. - [ ] Compress VGG16 + MobileNetv2 --- - experiments: - [x] 1: MobilenetV2 Adam - [ ] Overfit -> Check Add Learning Rate scheduler. - [x] 2: MobilenetV2 Adam-dsd - [x] 3: MobilenetV2 Sgd - [x] 4: MobilenetV2 Sgd-dsd - [x] 5: VGG16: SGD - [x] 6: VGG16: SGD-dsd --- - ` pipreqs /project/path` -> Generate requirements.txt based on import. MobilenetV2 dsd: - 8,9 Mo (8 903 688 octets) MobilenetV2: - 8,9 Mo (8 903 268 octets) --- # DSD Experiments(blog post) - dataset FER+2013 ## 1) Naive - 4 runs: - sgd - sgd-dsd - adam - adam-dsd previous ccl: same perf with or w/o DSD ## 2) Going further - **Hypothesis:** When deploying/packaging, is it better to keep DSD over baseline ? (since it has more weights to 0 -> lighter / less data to transfer on the network) - Compare without quantization baseline/DSD (high priority) - Compare with quantization baseline/DSD (low priority) > for each case, report quality (val loss | F1-score) and size (MB) indicators --- # TALK - [x] Recap pipeline - [x] previously done - [x] current hypothesis - [x] Paper recap - Goal - Pros/Cons - [ ] Results - [x] Enumerate settings of training - [x] Explain that we didn't succeed to make VGG16 converges with Adam. - [x] Compare val_[loss/acc] of VGG16/MobileNetV2 [sgd/sgd-dsd] + [adam/adam-dsd] - [x] CCL: no gain in accuracy - [x] Compare mobilenetv2, matrix 2x2 with sgd/sgd-dsd/adam/adam-dsd of val_acc - [x] CCL: Better to use Adam. - [ ] Compare train/val loss/acc of MobilenetV2 [sgd/sgd-dsd] - [ ] CCL: Form of regularization - [ ] After quantization, there is a gain in file zip (13%) - [ ] Mobilenet Normal/DSD -> zip -> compare file size - [ ] Mobilenet Normal/DSD -> quantization -> zip -> compare file size - [ ] CCL: With quantization, DSD offers a gain in file size storage. - [ ] Conclusion - [ ] Further work - [ ] Revenir en arriere dans la pipeline - [ ] Tres gros dataset ! --- # Recover `mlruns/` folder - Go to `19-03-2021/` - Depending on which framework you want, run: - `virtualenv lrde-env-[pytorch|tf2] && source lrde-env-[pytorch|tf2]/bin/activate && pip install -r requirements-[pytorch|tf2].txt` - If docker container `container-lrde-19-03-2021` already exists: - `sudo docker ps -a` and copy `CONTAINER_ID` - `sudo docker start CONTAINER_ID` - Else: - Create container with `mlruns/` folder - `sudo docker pull 3outeille/lrde-2021:19-03-2021` - `sudo docker run -d --name container-lrde-19-03-2021 3outeille/lrde-2021:19-03-2021 tail -f /dev/null` - `sudo docker cp container-lrde-19-03-2021:/experiments/ .` - Run `./recover_mlruns.sh [pytorch|tf2]` - Stop docker container - `sudo docker ps -a` and copy `CONTAINER_ID` - `sudo docker stop CONTAINER_ID` - You can now use mlflow on your browser. - `cd src/[pytorch|tf2] && mlflow ui` - Download pytorch-mlruns to 19_3 - - Just clean all path to make it work in local from 19_03_2021 and build an image: `/home/sphird/Document/19_03_2021/src/[tf2]`