# Sequoia Updates
[](https://hackmd.io/1o1YNPh-SNORndTlSlgHHA)
**Weekly update meeting / Office Hours: Friday 11am-12pm EST**:
Feel free to join using these links:
- <a target="_blank" href="https://calendar.google.com/event?action=TEMPLATE&tmeid=NzRyNmtvNGExZHRqazNxMWZhcTBhbzFyOHNfMjAyMTA4MTJUMTYwMDAwWiBmYWJyaWNlbm9ybWFuZGluQG0&tmsrc=fabricenormandin%40gmail.com&scp=ALL"><img border="0" src="https://www.google.com/calendar/images/ext/gc_button1_en.gif">Google Calendar Link</a>
## October 22st, 2021
## October 15th, 2021
## ICLR
- Latex formating
- [ ] make listings in python colors
- [ ] wrap up listing 1
- [ ] [ ] Figure 3 needs to be adapted
- [ ] increase font size
- [X] repair citation rendering
- [ ] Make Figure 1 (cartoon) more beautiful
- [ ] also, remove white space on top
- [ ] Move Table 1 (methods) to Appendix
- [X] Add methods (only their name) in Table 2
- [ ] adapt text in main
- [ ] adapt text in App
- [ ] increase font size in the plots
- writing
- [ ] big task: Figure 4 (RL backbones in CRL) needs improvement
- we need to think about this
- [ ] Mentions to CW10/CW20 benchmarks, CW methods
- [ ] Add CW methods in Appendix method table
- Experiment related
- CRL experiments
- Task-Incremental Hopper-v3, Bodyparts (Size + mass), 5 seeds, 200 steps per episode:
| Method | seeds | Online perf. | Final perf. | notes |
| ----------- | ----- | ------------ | ----------- |------ |
| Random | todo | (todo) | (todo) | |
| SAC | 5 | 339.533 | 238.874 | |
| EWC | 5 | 202.247 | 148.772 | |
| MAS | 6 | 133.033 | 40.869 | |
| L2 reg | 4-ish| 89.173 | 107.469 | |
| AGEM | 5 | 305.786 | 185.442 | |
| PackNet | 5 | (running) | (running) | |
| Perfect mem | 0 | (running) | (running) | Bump mem to 32Gb |
| VCL | 0 | (todo: bug) | (todo: bug) | |
- 1000 steps / episode, hidden size of [256, 256]
| Method | seeds | Online perf. | Final perf. | notes |
| ----------- | ----- | ------------ | ----------- |------ |
| Random | todo | (todo) | (todo) | |
| SAC | 0 | | |queued|
| EWC | 0 | | |queued|
| MAS | 0 | | |queued|
| L2 reg | 0 | | |queued|
| AGEM | 0 | | |queued|
| PackNet(large net) | 5 | 329.095 | 1424.005 | Re-run with hidden_sizes of [256, 256] if time permits. |
| PackNet | 0 | todo | todo | Re-run with hidden_sizes of [256, 256] if time permits. |
| VCL | 0 | (todo: bug) | (todo: bug) | |
- Bonus
- make Table 1 (moved in the Appendix) nicer
## October 7th, 2021
- ICLR deadline: October 5th
## September 30rd, 2021
### Ryan
- Created a PR for PettingZoo MARL Sequoia example
## September 23rd, 2021
sequoia
- [ ] Launch lots of runs using these new CRL methods
- [ ] LPG-FTW
- [ ] Hopper + Gravity / BodySize (6 algos x 1 env x 2 task type) = 12 "sweeps" (groups of runs, just changing the seed) 5 seeds?
- [ ] IncrementalRL
- [ ] TraditionalRL
- [ ] CW10 / CW20 ?
- [ ] 3 seeds?
- [ ] TaskIncrementalRL
- [ ] MultiTaskRL
- [ ] Move MT10/MT50/CW10/CW20 from IncrementalRL -> DiscreteTaskAgnosticRL.
### Ryan
- Got basic PettingZoo script actually running without error (lots of hacks though)
- Need to fix the different hacks and integrate more seamlessly with Sequoia
## September 16th, 2021
- [X] Add methods from Continual World:
- [X] Get their methods working
- [X] Add wandb logging to their methods
- Example runs: https://wandb.ai/sequoia/cw_debug?workspace=user-sequoia
- [ ] Launch lots of runs using these new CRL methods
- [ ] Move MT10/MT50/CW10/CW20 from IncrementalRL -> DiscreteTaskAgnosticRL.
### Ryan
- Making progress in debugging simple PettingZoo PPO run
- Going to first get it working on barebones Sequoia setting
## September 9th, 2021
- [ ] Add methods from Continual World:
- [X] Get their methods working
- [ ] Move MT10/MT50/CW10/CW20 from IncrementalRL -> DiscreteTaskAgnosticRL.
- [ ] Add wandb logging to their methods
- [X] Continual Mujoco bodisize modification
- [ ] Make PR for it
- [ ] Finish PR for ReplayV2
## Lucas
- D3RLPY Sequoia Integration
- Do algorithms that are able to accept offline data require it? Or is it just a feature that they support
- https://github.com/takuseno/d3rlpy/blob/97cbe62d4abf3437914bb6f117d4b68a321ad888/d3rlpy/algos/base.py
- all algos can fit online or offline
- fit() --> fit offline
- fit_online() --> fit online
- What do the datasets/trains look like and how do they map to supervised learning?
- https://github.com/takuseno/d3rlpy/blob/97cbe62d4abf3437914bb6f117d4b68a321ad888/d3rlpy/dataset.pyx
- are episode terminals necessary?
- difference between terminals and episode terminals?
----
### Ryan
- SB3 has breaking changes with the ObsDictWrapper - was looking at potential upgrade in Sequoia
- Debugging simple pettingzoo script using it as sequoia setting
- It seems pettingzoo does one agent step at a time, even in parallel env - will sequoia setting.apply() need to be modified?
- Simplest explanation
1. PettingZoo Sequential (e.g. PistonBall env)
2. Wrapped with the to_parallel_env class to look like a PettingZoo Parallel MARL environment
3. Wrapped with the SB3VecEnvWrapper (or the "GymVectorEnvWrapper") to look like a Vectorized single-environment (VectorEnv)
4. Sequoia can deal with gym's `VectorEnv`'s, doesn't yet support the SB3 "VecEnv"s.
## September 2nd, 2021
### Ryan
- Investigated whether PettingZoo envs all have parallel wrapper, or if some are just sequential
- All Atari, Butterfly, MAgent, MPE, SISL envs have parallel env wrapper
- Most classical environments do not have parallel env wrapper
- The only one that does is rock-paper-scissors (rps)
- Ray RLlib does not use parallel environment, even though they have an example training script with a single, shared policy for all agents
- https://github.com/ray-project/ray/blob/master/rllib/examples/pettingzoo_env.py
- https://github.com/ray-project/ray/blob/master/rllib/examples/multi_agent_parameter_sharing.py
- There was one arxiv preprint paper by pettingzoo authors (that was withdrawn) that used pettingzoo within rllib framework - and didn't seem to use parallel env either
- https://arxiv.org/abs/2005.13625
- Still have to check out other repos/papers that depend on/leverage PettingZoo
- https://github.com/PettingZoo-Team/PettingZoo/network/dependents?package_id=UGFja2FnZS0xMTQ4MzIxNzgy
- Petting zoo havs 3rd party envs that might have wrappers around pettingZoo (Starcraft repo for example has a PettingZoo env class)
## August 26th, 2021
### Fabrice
- [ ] Add methods from Continual World:
- [X] Make our envs look exactly like theirs
- [X] Make their training code work with out environments
- [X] extract their SAC training code into a new Method base class
- [x] Create a new Method class for each of the CL methods in that repo
- [x] Modify their models so they work on other envs than just mujoco
- [ ] Move MT10/MT50/CW10/CW20 from IncrementalRL -> DiscreteTaskAgnosticRL.
- [ ] Create a "mini-sequoia" to help others understand its design:
Recreate a smaller version of Sequoia from scratch, with lots of comments, with as few lines of code as possible.
This could be useful for the PL folks to take a look at, and for people who would want to contribute to Sequoia.
- [x] `basics.py`: describe the Setting and Method APIs.
- [ ] `concrete_example.py`: give an example of a "concrete" setting and method, e.g. Task-Incremental SL.
- [ ] `multiple_settings.py`: Start from something more general than Task-Incremental Setting, and create a small hierarchy of SL settings.
- [ ] Finish PR for ReplayV2
- [ ] Add brax support [Brax](https://www.github.com/google/brax)
- [X] Make it PyTorch-Compatible
- [ ] Adding non-stationarities to the envs
- [ ] Batched environment with SB3 (SB-VectorEnv vs Gym VectorEnv)
- [ ] add PackNet to hparam sweep
- [ ] test it
- [~] Update the README.md:
- [ ] Add some images maybe?
- [ ] Logo?
- [ ] Look into SB3 VectorEnv compatibility
- [ ] Design way to store hparam configs for each method
- [X] Continual Mujoco bodisize modification
- [ ]
<!-- -->
### Lucas
- End goal: Something like CN-DPM on Offline RL (with multiple games)
- [ ] Play around with d3rlpy
- https://github.com/takuseno/d3rlpy
- [ ] Look at Conservative Q-Learning and other Offline-RL algorithms (via code or papers)
- https://sites.google.com/view/cql-offline-rl
- [ ] Read about forward dynamics models, can they work like VAE in NDPM?
### Ryan
- [ ] Add/Integrate PettingZoo to Sequoia
- Benchmarking performance on SB3 to see how compares with sequoia
- Trying to get simple runs with RLSetting to make sure performance is approx. equivalent
- Plan is to next run CustomPPOMethod from sequoia on simple MARL set-up with pettingzoo, to get understanding of how library will hook in
- Then think about how to structure the setting file once have a working implementation of sequoia method functioning within pettingzoo
## August 19th, 2021
### Updates:
- Look into Import Continual World CRL Algorithms as Methods?
### Fabrice
- [X] Look at Continual World code
- [ ] Finish PR for ReplayV2
- [ ] Add brax support [Brax](https://www.github.com/google/brax)
- [ ] add PackNet to hparam sweep
- [ ] test it
- [~] Update the README.md:
- [ ] Add some images maybe?
- [ ] Logo?
- [ ] Look into SB3 VectorEnv compatibility
- [ ] Design way to store hparam configs for each method
### Lucas
- [ ] Check out garage toolkit for MetaRL
- [ ] Add PackNet CRL Example
- [ ] test it
### Ryan
- [ ] Add/Integrate PettingZoo to Sequoia
- Dug into SB3 a bit more to understand how we approach continual RL settings and leverage SB3 to achieve this
- Questions:
- Do we add MARL under RLSetting? Do we add non-stationarity later?
- Yes!
- Why do we wrap all the SB3 algos, e.g. `A2CModel(A2C, OnPolicyModel)`?
- So that we are able to "patch" any method by overriding them
- So that we can create dataclasses for their hyper-parameters and avoid duplicating the constructor arguments across 6 different files.
- What is the role of the `GymDataLoader`? Is it something that I would need to use when creating the MARL Setting?
- The GymDataLoader is used so that methods that use PyTorch-Lightning can treat the environments as dataloaders. (It implements the gym.Env API as well as the DataLoader API from PyTorch.)
- No, you don't need to worry about PyTorch-Lightning compatibility for now. I'll take care of that later. For now, just return whatever environments a Method for MARL would expect (ideally something close to the PettingZoo API).
### Massimo
- [ ]
## August 12th, 2021
## Updates
- adding a potential collaborator (zeyuan, intern at MILA)
- 1hour technical Meeting PyTorch Ligning folks coming soon (yay)
## Todos:
- [ ] add LaMAML to hparam sweep
- [ ] test it
### Fabrice
- [ ] Finish PR for ReplayV2
- [ ] Look into using [Brax](https://www.github.com/google/brax) for RL
- [ ] add PackNet to hparam sweep
- [ ] test it
- [~] Update the README.md:
- [ ] Add some images maybe?
- [ ] Logo?
- [ ] Look into SB3 VectorEnv compatibility
- [ ] Design way to store hparam configs for each method
### Massimo
- [ ] Get familiar with SB3
- [ ] Add Hindsight Experience Replay to Sequoia
### Ryan
- [ ] Add/Integrate PettingZoo to Sequoia
- From last week:
- Reviewed Stable Baselines (already integrated) and PettingZoo APIs in prep to integrate PettingZoo
## August 5th, 2021
### Updates:
- Brax compatibility is almost here!
- PyTorch-Lightning folks are interested! (yay!)
- Look into adding [CTrlBenchmark for CSL](https://github.com/facebookresearch/CTrLBenchmark)!
### Todos:
Fabrice:
- [X] Create an issue for PyTorch 1.9 compatibility
- [X] Refactoring Replay method
- [X] Write the tests
- [ ] Make them all pass
- [ ] Make a PR
- [~] Add PackNet to the PL example
- [ ] Create a notebook version?
- [~] Update the README.md:
- [X] Guide users more directly to the examples
- [X] Remove extra stuff
- [ ] Add some images maybe?
- [ ] Logo?
Lucas:
- [~] PackNet:
- [X] Create a fork of Sequoia
- [X] Create a PR to add PackNet (single file)
- [ ] (optional) Add PackNet Callback to the PL Example
- (Fabrice: Actually, it might make more sense to do EWC (to match quick_demo_ewc.py)
- RL:
- Read some papers
Ryan:
- [~] CN-DPM:
- [X] Refactor the configs yaml files into dataclasses
- [X] Create PR
- [X] Add tests
- [~] Make tests pass
- (Need to tweak the configs so the tests are quicker to run)
- [X] Merge PR
- [ ] Test it out on different datasets than mnist (dynamic input size?)
- RL:
- Read some papers:
- Berkeley research (Sergei Levine and others)
- NeurIPS Deep RL workshop: AVID
- GAN for domain transfer between real human demonstrations and robot world
- Train robot on generated demonstrations
- Assign reward to intermediate steps?
Massimo:
- [X] Submit the Arxiv version
- [X] Fix the little typos on first version
- [ ] Get familiar with SB3
- [ ] Add Hindsight Experience Replay to Sequoia
## July 29th, 2021:
Updates:
- Chat with PyTorch-Lightning Flash maintainers
- PyTorch-Lightning's `Callback` is an easy way to add a "plugin" to any algo!
- Probably a good idea to retire this "auxiliary task" API.
Todos:
- [ ] Push it to Arxiv
- [ ] Finish empirical section
- [ ] SL (ewc)
- [ ] Look into BRAX for massively parallel RL environments
- [ ] Make SB3 methods work w/ batched envs
- [ ] Refactor Replay (based on BaseMethod)
## July 22nd 2021:
Updates:
1. New way to add methods to Sequoia!
2. CN-DPM is now available as a Method!
3. Integration of Mostafa's submission in the examples!
New Issues:
- [ ] Look into using [Brax](https://www.github.com/google/brax) for RL
- [X] Add pytorch-lightning example
## Jul 7th
### Before Arxiv:
- [ ] Holes in CSL study
- [x] launch Experience_replay in Cifar10
- [ ] Holes in CRL study
- [x] No online performance anywhere (except Monsterkong).
- [X] Set `monitor_training_performance` to `True` by default in RL
- [X] Relauch everything?
- [ ] if so, start a new workspace and copy Monsterkong runs
- (Not sure this is needed)
- [x] Replace '0' with None in Wandb so the average shows a good average of the online performance
- [ ] Half-Chettah
- [X] ~~baseline not launched (maybe bc it's not continuous?)~~ (BaseMethod doesn't support continuous action spaces yet)
- [x] ~~no sucessful DQN anywhere~~ Missing DQN runs (DQN doesn't support continuous action spaces):
- [ ] CartPole
- [ ] MonsterKong
- [ ] no sucessful SAC runs in multi-task and incremental RL:
- 
- [x] MontainCar
- ~~[ ] baseline method not launched (maybe bc it's not continuous?)~~ (same as above)
- [ ] MonsterKong
- [ ] base method not launched
- [ ] DQN
- [x] accept/reject updates in the overleaf
- [ ] Refactor Replay (based on BaseMethod)
### After Arxiv:
- [ ] Improved Command-Line API
- [ ] Debugging MetaWorld:
- [ ] CW10 / MT10 / CW20 only have one run per algo?
- Q: Are some properties persisting between runs in an hparam sweep? (e.g. train_env?)
- [ ] SAC
- [ ] IncrementalRL doesn't have runs
- [ ] Step limit doesn't seem to be working
- Q: is `max_episode_steps` being set on the Setting when using a MetaWorld end?
- Do all metaworld envs have the same episode length limit? (500?)
- [ ] Add SAC Output Head to BaseMethod
- [ ] Choose the best name for 'Model' below:
```python3
class BaseMethod:
model: BaselineModel
class Model(LightningModule): # <--- this
...
class MultiHeadModelMixin(Model):
...
class SelfSupervisedMixin(Model):
...
class SemiSupervisedMixin(Model):
...
class BaseModel(
MultiHeadModelMixin,
SelfSupervisedModelMixin,
SemiSupervisedModelMixin
):
...
```
- [x] Add 'avalanche' prefix to all avalanche methods, not just conflicting ones.
- [x] Same, but for SB3 (or all Methods who have a 'family' field)
- [ ] Convert older runs in W&B:
- [ ] Renamed settings
- [ ] Renamed methods
- [ ] [reproducibility](https://github.com/lebrice/Sequoia/projects/12#card-64649672)
## June 7th:
Things to read:
- [ ] BabyAI: GridWorld + text
- [ ] Paper https://openreview.net/pdf?id=rJeXCo0cYX
- [ ] Code
# Massimo's Sequoia TODOs
### Before Arxiv
- finish CSL analysis
- maybe add a multiple setting figure, e.g. w/ baseMethod
- finish CRL analysis
- maybe add a multiple setting figure, e.g. w/ baseMethod