Sparse pattern transfer project

# Sparse pattern transfer project ###### tags: `Project report` --- [TOC] --- ### Problem: Sparsity pattern transfer Yangchen: sparsity pattern transfer from offline to online motivation: if we can use the offline dataset to find the sparsity pattern (i.e., zero out weights in a NN), then we can deploy a much light-weight NN during online fine-tuning; ultimately, we can probably achieve the effect: the more a NN learns, the fewer params it needs (there is existing work that can justify why this is possible). the most relevant literature should be: 1. sparsity pattern transfer in RL/SL(domain adaptation/transfer); 2. different methods to learn sparse NN (we do not need to spend much time on this, we can simply pick up a reasonable/well-cited one, maybe even L1 regularization is fine). 3. No need to read offline RL for now. #### proposed setting 1. offline (M1) + online (M1') + offline (M2) + online (M2') ..., ideally 2. for now assuming sufficient memory to store all data #### empirical 1. different types of offline dataset: poor/good coverage of space, expert/bad behavior policy 2. assuming no change to existing offlineRL algo ### Experiments #### Offline dataset 1. random policy: random from action space; 2. medium: learn a policy that reaches about 1/3 of optimal policy’s performance, then use this to collect data; remember also to store the replay buffer used to train such a policy 3. expert: optimal policy 4. medium+expert: half 2 + half 3 5. consider some extremely bad data: say remove all data from 4 where position > 0. on mountain car, no need to do this on ant #### Exp settings 1. The sparsity pattern of the NN trained with only offline data (call M1); 2. **No/dense offline+sparse online**: the sparsity pattern of the NN trained by pruning after online learning without and with offline learning 3. **Sparse offline+finetune online**: use M1 and fix the sparsity pattern (at sparsity levels of pruning 90%, 60%, 30%) and do online fine-tuning; show the learning curve 4. **Sparse offline+reinit+finetune online**: use M1 and fix the sparsity pattern (at sparsity levels of pruning 90%, 60%, 30%) and then reinitialize the NN (so you only need to use offline data to learn the sparsity pattern, but not the trained weight values) and do online fine-tuning; 5. **Sparse offline+dense online**: use M1, then continue online fine-tuning without restricting the sparsity at all. 6. **Sparse offline+sparse online** 7. **Sparse offline+freeze eval online** #### Exp todos - [ ] gaussian policy, state tensor-2-array - [ ] device - [ ] compute batched (unused), torchify, sample_batch, evalute_policy =? eval_actor, set_seed - [ ] online fine-tune code - [ ] logger, wandb and local ### Solutions [How Well Do Sparse Imagenet Models Transfer?](https://arxiv.org/abs/2111.13445). Eugenia Iofinova, Alexandra Peste, Mark Kurtz, Dan Alistarh. CVPR 2022. - transfer performance of sparse models on standard imagenet transfer datasets: train on imangenet-1k dataset, eval on different tasks - model is sparsified on the upstream dataset - sparsification methods studied (see intro and sec2.1) - progressive sparsification: prune dense baseline. eg GMP, WoodFisher - sparse regularized training: prune during training. eg STR, AD/DC, RigL - Lottery Ticket Hypothesis (LTH): re-train ticket from scratch. eg LTH-T - ![](https://i.imgur.com/vceGz3S.png =x120) - ![](https://i.imgur.com/uyy8b1K.png =x200) - [github repo](https://github.com/IST-DASLab/sparse-imagenet-transfer) [The State of Sparse Training in Deep Reinforcement Learning](https://proceedings.mlr.press/v162/graesser22a.html). and [arxiv version](https://arxiv.org/abs/2206.10369). Laura Graesser, Utku Evci, Erich Elsen, Pablo Samuel Castro. ICML 2022. - online RL setting - discrete: classic; 15 atari games - continuous: 5 mujoco - algos: DQN, SAC, PPO - obtain sparse NN (benchmarked algos are marked *) - dense-to-sparse training: pruning; different from *progressive* methods in paper above. eg GMP*, STR, AC/DC - sparse training: initialize sparse NN and train. eg LTH - static: prune a dense NN immediately on iteration 0. eg random pruning* - dynamic: start with sparse NN, change sparse connecticity at training. eg SET*, RigL* - actor-critic: majority of param to critic, ERK sparsity - sparse model more robust - ![](https://i.imgur.com/A8lmj7D.png =x300) - [github repo](https://github.com/google-research/rigl/tree/master/rigl/rl) [Single-Shot Pruning for Offline Reinforcement Learning](https://arxiv.org/abs/2112.15579). Samin Yeasar Arnob, Riyasat Ohib, Sergey Plis, Doina Precup. Offline RL workshop at Neurips 2021. - offline RL setting; algo: Batch-Constrained deep Q-learning (BCQ), Behavior Cloning (BC) - mujoco tasks on D4RL dataset - offline training - directly use sparse learning algo - one-shot pruning: prune dense net at iteration 0; algos SNIP, GraSP - 95% parameter pruned, without performance degradation in majority of tasks [RLx2: Training a Sparse Deep Reinforcement Learning Model from Scratch](https://arxiv.org/abs/2205.15043). Yiqin Tan, Pihe Hu, Ling Pan, Jiatai Huang, Longbo Huang. ICLR 2023 spotlight. - online RL setting; algos SAC, TD3 - tasks: mujoco - RigL + robust value learning - multi-step TD target - dynamic-capacity replay - 95% pruned, performance degradation is within ±%3 of that under the original dense models - [github repo](https://github.com/tyq1024/RLx2) [Dynamic Sparse Training for Deep Reinforcement Learning](https://arxiv.org/abs/2106.04217). Ghada Sokar, Elena Mocanu, Decebal Constantin Mocanu, Mykola Pechenizkiy, Peter Stone. IJCAI 2022. - online RL; algos SAC, TD3 - tasks: mujoco - adpot SET method - pruned 50% [Automatic Noise Filtering with Dynamic Sparse Training in Deep Reinforcement Learning](https://arxiv.org/abs/2302.06548). Bram Grooten, Ghada Sokar, Shibhansh Dohare, Elena Mocanu, Matthew E. Taylor, Mykola Pechenizkiy, Decebal Constantin Mocanu. AAMAS 2023. - tasks: mujoco; extremely noisy env: up to 99% of the input features are pure noise - adopt DS-TD3 and DS-SAC from paper above - online RL; and transfer RL (agent not know change of task) [The Dormant Neuron Phenomenon in Deep Reinforcement Learning](https://arxiv.org/abs/2302.12902). Ghada Sokar, Rishabh Agarwal, Pablo Samuel Castro, Utku Evci. - not sparse learning; instead improve expressiveness of dense NN - dormant neuron phenomenon: neurons that are practically inactive through low activations - non-stationarity in RL - Input data non-stationarity: not a major factor - Target non-stationarity: exacerbates dormant neurons - ![](https://i.imgur.com/6SeYyZL.png =x200) [Sparse Deep Transfer Learning for Convolutional Neural Network](https://www.semanticscholar.org/paper/Sparse-Deep-Transfer-Learning-for-Convolutional-Liu-Wang/164b0e2a03a5a402f66c497e6c327edf20f8827b). Jiaming Liu, Yali Wang, Y. Qiao. AAAI 2017. - GMP (gradual maginitude pruning) from paper above - source domain: iterative pruning strategy in [Han et al](https://arxiv.org/abs/1506.02626) (combination of progressive and regularized pruning) - ![](https://i.imgur.com/VUvlRns.png)