# Sparse pattern transfer project
###### tags: `Project report`
---
[TOC]
---
### Problem: Sparsity pattern transfer
Yangchen:
sparsity pattern transfer from offline to online
motivation: if we can use the offline dataset to find the sparsity pattern (i.e., zero out weights in a NN), then we can deploy a much light-weight NN during online fine-tuning; ultimately, we can probably achieve the effect: the more a NN learns, the fewer params it needs (there is existing work that can justify why this is possible).
the most relevant literature should be:
1. sparsity pattern transfer in RL/SL(domain adaptation/transfer);
2. different methods to learn sparse NN (we do not need to spend much time on this, we can simply pick up a reasonable/well-cited one, maybe even L1 regularization is fine).
3. No need to read offline RL for now.
#### proposed setting
1. offline (M1) + online (M1') + offline (M2) + online (M2') ..., ideally
2. for now assuming sufficient memory to store all data
#### empirical
1. different types of offline dataset: poor/good coverage of space, expert/bad behavior policy
2. assuming no change to existing offlineRL algo
### Experiments
#### Offline dataset
1. random policy: random from action space;
2. medium: learn a policy that reaches about 1/3 of optimal policy’s performance, then use this to collect data; remember also to store the replay buffer used to train such a policy
3. expert: optimal policy
4. medium+expert: half 2 + half 3
5. consider some extremely bad data: say remove all data from 4 where position > 0. on mountain car, no need to do this on ant
#### Exp settings
1. The sparsity pattern of the NN trained with only offline data (call M1);
2. **No/dense offline+sparse online**: the sparsity pattern of the NN trained by pruning after online learning without and with offline learning
3. **Sparse offline+finetune online**: use M1 and fix the sparsity pattern (at sparsity levels of pruning 90%, 60%, 30%) and do online fine-tuning; show the learning curve
4. **Sparse offline+reinit+finetune online**: use M1 and fix the sparsity pattern (at sparsity levels of pruning 90%, 60%, 30%) and then reinitialize the NN (so you only need to use offline data to learn the sparsity pattern, but not the trained weight values) and do online fine-tuning;
5. **Sparse offline+dense online**: use M1, then continue online fine-tuning without restricting the sparsity at all.
6. **Sparse offline+sparse online**
7. **Sparse offline+freeze eval online**
#### Exp todos
- [ ] gaussian policy, state tensor-2-array
- [ ] device
- [ ] compute batched (unused), torchify, sample_batch, evalute_policy =? eval_actor, set_seed
- [ ] online fine-tune code
- [ ] logger, wandb and local
### Solutions
[How Well Do Sparse Imagenet Models Transfer?](https://arxiv.org/abs/2111.13445). Eugenia Iofinova, Alexandra Peste, Mark Kurtz, Dan Alistarh. CVPR 2022.
- transfer performance of sparse models on standard imagenet transfer datasets: train on imangenet-1k dataset, eval on different tasks
- model is sparsified on the upstream dataset
- sparsification methods studied (see intro and sec2.1)
- progressive sparsification: prune dense baseline. eg GMP, WoodFisher
- sparse regularized training: prune during training. eg STR, AD/DC, RigL
- Lottery Ticket Hypothesis (LTH): re-train ticket from scratch. eg LTH-T
- 
- 
- [github repo](https://github.com/IST-DASLab/sparse-imagenet-transfer)
[The State of Sparse Training in Deep Reinforcement Learning](https://proceedings.mlr.press/v162/graesser22a.html). and [arxiv version](https://arxiv.org/abs/2206.10369). Laura Graesser, Utku Evci, Erich Elsen, Pablo Samuel Castro. ICML 2022.
- online RL setting
- discrete: classic; 15 atari games
- continuous: 5 mujoco
- algos: DQN, SAC, PPO
- obtain sparse NN (benchmarked algos are marked *)
- dense-to-sparse training: pruning; different from *progressive* methods in paper above. eg GMP*, STR, AC/DC
- sparse training: initialize sparse NN and train. eg LTH
- static: prune a dense NN immediately on iteration 0. eg random pruning*
- dynamic: start with sparse NN, change sparse connecticity at training. eg SET*, RigL*
- actor-critic: majority of param to critic, ERK sparsity
- sparse model more robust
- 
- [github repo](https://github.com/google-research/rigl/tree/master/rigl/rl)
[Single-Shot Pruning for Offline Reinforcement Learning](https://arxiv.org/abs/2112.15579). Samin Yeasar Arnob, Riyasat Ohib, Sergey Plis, Doina Precup. Offline RL workshop at Neurips 2021.
- offline RL setting; algo: Batch-Constrained deep Q-learning (BCQ), Behavior Cloning (BC)
- mujoco tasks on D4RL dataset
- offline training
- directly use sparse learning algo
- one-shot pruning: prune dense net at iteration 0; algos SNIP, GraSP
- 95% parameter pruned, without performance degradation in majority of tasks
[RLx2: Training a Sparse Deep Reinforcement Learning Model from Scratch](https://arxiv.org/abs/2205.15043). Yiqin Tan, Pihe Hu, Ling Pan, Jiatai Huang, Longbo Huang. ICLR 2023 spotlight.
- online RL setting; algos SAC, TD3
- tasks: mujoco
- RigL + robust value learning
- multi-step TD target
- dynamic-capacity replay
- 95% pruned, performance degradation is within ±%3 of that under the original dense models
- [github repo](https://github.com/tyq1024/RLx2)
[Dynamic Sparse Training for Deep Reinforcement Learning](https://arxiv.org/abs/2106.04217). Ghada Sokar, Elena Mocanu, Decebal Constantin Mocanu, Mykola Pechenizkiy, Peter Stone. IJCAI 2022.
- online RL; algos SAC, TD3
- tasks: mujoco
- adpot SET method
- pruned 50%
[Automatic Noise Filtering with Dynamic Sparse Training in Deep Reinforcement Learning](https://arxiv.org/abs/2302.06548). Bram Grooten, Ghada Sokar, Shibhansh Dohare, Elena Mocanu, Matthew E. Taylor, Mykola Pechenizkiy, Decebal Constantin Mocanu. AAMAS 2023.
- tasks: mujoco; extremely noisy env: up to 99% of the input features are pure noise
- adopt DS-TD3 and DS-SAC from paper above
- online RL; and transfer RL (agent not know change of task)
[The Dormant Neuron Phenomenon in Deep Reinforcement Learning](https://arxiv.org/abs/2302.12902). Ghada Sokar, Rishabh Agarwal, Pablo Samuel Castro, Utku Evci.
- not sparse learning; instead improve expressiveness of dense NN
- dormant neuron phenomenon: neurons that are practically inactive through low activations
- non-stationarity in RL
- Input data non-stationarity: not a major factor
- Target non-stationarity: exacerbates dormant neurons
- 
[Sparse Deep Transfer Learning for Convolutional Neural Network](https://www.semanticscholar.org/paper/Sparse-Deep-Transfer-Learning-for-Convolutional-Liu-Wang/164b0e2a03a5a402f66c497e6c327edf20f8827b). Jiaming Liu, Yali Wang, Y. Qiao. AAAI 2017.
- GMP (gradual maginitude pruning) from paper above
- source domain: iterative pruning strategy in [Han et al](https://arxiv.org/abs/1506.02626) (combination of progressive and regularized pruning)
- 