Sequoia Updates

# Sequoia Updates [![hackmd-github-sync-badge](https://hackmd.io/1o1YNPh-SNORndTlSlgHHA/badge)](https://hackmd.io/1o1YNPh-SNORndTlSlgHHA) **Weekly update meeting / Office Hours: Friday 11am-12pm EST**: Feel free to join using these links: - <a target="_blank" href="https://calendar.google.com/event?action=TEMPLATE&tmeid=NzRyNmtvNGExZHRqazNxMWZhcTBhbzFyOHNfMjAyMTA4MTJUMTYwMDAwWiBmYWJyaWNlbm9ybWFuZGluQG0&tmsrc=fabricenormandin%40gmail.com&scp=ALL"><img border="0" src="https://www.google.com/calendar/images/ext/gc_button1_en.gif">Google Calendar Link</a> ## October 22st, 2021 ## October 15th, 2021 ## ICLR - Latex formating - [ ] make listings in python colors - [ ] wrap up listing 1 - [ ] [ ] Figure 3 needs to be adapted - [ ] increase font size - [X] repair citation rendering - [ ] Make Figure 1 (cartoon) more beautiful - [ ] also, remove white space on top - [ ] Move Table 1 (methods) to Appendix - [X] Add methods (only their name) in Table 2 - [ ] adapt text in main - [ ] adapt text in App - [ ] increase font size in the plots - writing - [ ] big task: Figure 4 (RL backbones in CRL) needs improvement - we need to think about this - [ ] Mentions to CW10/CW20 benchmarks, CW methods - [ ] Add CW methods in Appendix method table - Experiment related - CRL experiments - Task-Incremental Hopper-v3, Bodyparts (Size + mass), 5 seeds, 200 steps per episode: | Method | seeds | Online perf. | Final perf. | notes | | ----------- | ----- | ------------ | ----------- |------ | | Random | todo | (todo) | (todo) | | | SAC | 5 | 339.533 | 238.874 | | | EWC | 5 | 202.247 | 148.772 | | | MAS | 6 | 133.033 | 40.869 | | | L2 reg | 4-ish| 89.173 | 107.469 | | | AGEM | 5 | 305.786 | 185.442 | | | PackNet | 5 | (running) | (running) | | | Perfect mem | 0 | (running) | (running) | Bump mem to 32Gb | | VCL | 0 | (todo: bug) | (todo: bug) | | - 1000 steps / episode, hidden size of [256, 256] | Method | seeds | Online perf. | Final perf. | notes | | ----------- | ----- | ------------ | ----------- |------ | | Random | todo | (todo) | (todo) | | | SAC | 0 | | |queued| | EWC | 0 | | |queued| | MAS | 0 | | |queued| | L2 reg | 0 | | |queued| | AGEM | 0 | | |queued| | PackNet(large net) | 5 | 329.095 | 1424.005 | Re-run with hidden_sizes of [256, 256] if time permits. | | PackNet | 0 | todo | todo | Re-run with hidden_sizes of [256, 256] if time permits. | | VCL | 0 | (todo: bug) | (todo: bug) | | - Bonus - make Table 1 (moved in the Appendix) nicer ## October 7th, 2021 - ICLR deadline: October 5th ## September 30rd, 2021 ### Ryan - Created a PR for PettingZoo MARL Sequoia example ## September 23rd, 2021 sequoia - [ ] Launch lots of runs using these new CRL methods - [ ] LPG-FTW - [ ] Hopper + Gravity / BodySize (6 algos x 1 env x 2 task type) = 12 "sweeps" (groups of runs, just changing the seed) 5 seeds? - [ ] IncrementalRL - [ ] TraditionalRL - [ ] CW10 / CW20 ? - [ ] 3 seeds? - [ ] TaskIncrementalRL - [ ] MultiTaskRL - [ ] Move MT10/MT50/CW10/CW20 from IncrementalRL -> DiscreteTaskAgnosticRL. ### Ryan - Got basic PettingZoo script actually running without error (lots of hacks though) - Need to fix the different hacks and integrate more seamlessly with Sequoia ## September 16th, 2021 - [X] Add methods from Continual World: - [X] Get their methods working - [X] Add wandb logging to their methods - Example runs: https://wandb.ai/sequoia/cw_debug?workspace=user-sequoia - [ ] Launch lots of runs using these new CRL methods - [ ] Move MT10/MT50/CW10/CW20 from IncrementalRL -> DiscreteTaskAgnosticRL. ### Ryan - Making progress in debugging simple PettingZoo PPO run - Going to first get it working on barebones Sequoia setting ## September 9th, 2021 - [ ] Add methods from Continual World: - [X] Get their methods working - [ ] Move MT10/MT50/CW10/CW20 from IncrementalRL -> DiscreteTaskAgnosticRL. - [ ] Add wandb logging to their methods - [X] Continual Mujoco bodisize modification - [ ] Make PR for it - [ ] Finish PR for ReplayV2 ## Lucas - D3RLPY Sequoia Integration - Do algorithms that are able to accept offline data require it? Or is it just a feature that they support - https://github.com/takuseno/d3rlpy/blob/97cbe62d4abf3437914bb6f117d4b68a321ad888/d3rlpy/algos/base.py - all algos can fit online or offline - fit() --> fit offline - fit_online() --> fit online - What do the datasets/trains look like and how do they map to supervised learning? - https://github.com/takuseno/d3rlpy/blob/97cbe62d4abf3437914bb6f117d4b68a321ad888/d3rlpy/dataset.pyx - are episode terminals necessary? - difference between terminals and episode terminals? ---- ### Ryan - SB3 has breaking changes with the ObsDictWrapper - was looking at potential upgrade in Sequoia - Debugging simple pettingzoo script using it as sequoia setting - It seems pettingzoo does one agent step at a time, even in parallel env - will sequoia setting.apply() need to be modified? - Simplest explanation 1. PettingZoo Sequential (e.g. PistonBall env) 2. Wrapped with the to_parallel_env class to look like a PettingZoo Parallel MARL environment 3. Wrapped with the SB3VecEnvWrapper (or the "GymVectorEnvWrapper") to look like a Vectorized single-environment (VectorEnv) 4. Sequoia can deal with gym's `VectorEnv`'s, doesn't yet support the SB3 "VecEnv"s. ## September 2nd, 2021 ### Ryan - Investigated whether PettingZoo envs all have parallel wrapper, or if some are just sequential - All Atari, Butterfly, MAgent, MPE, SISL envs have parallel env wrapper - Most classical environments do not have parallel env wrapper - The only one that does is rock-paper-scissors (rps) - Ray RLlib does not use parallel environment, even though they have an example training script with a single, shared policy for all agents - https://github.com/ray-project/ray/blob/master/rllib/examples/pettingzoo_env.py - https://github.com/ray-project/ray/blob/master/rllib/examples/multi_agent_parameter_sharing.py - There was one arxiv preprint paper by pettingzoo authors (that was withdrawn) that used pettingzoo within rllib framework - and didn't seem to use parallel env either - https://arxiv.org/abs/2005.13625 - Still have to check out other repos/papers that depend on/leverage PettingZoo - https://github.com/PettingZoo-Team/PettingZoo/network/dependents?package_id=UGFja2FnZS0xMTQ4MzIxNzgy - Petting zoo havs 3rd party envs that might have wrappers around pettingZoo (Starcraft repo for example has a PettingZoo env class) ## August 26th, 2021 ### Fabrice - [ ] Add methods from Continual World: - [X] Make our envs look exactly like theirs - [X] Make their training code work with out environments - [X] extract their SAC training code into a new Method base class - [x] Create a new Method class for each of the CL methods in that repo - [x] Modify their models so they work on other envs than just mujoco - [ ] Move MT10/MT50/CW10/CW20 from IncrementalRL -> DiscreteTaskAgnosticRL. - [ ] Create a "mini-sequoia" to help others understand its design: Recreate a smaller version of Sequoia from scratch, with lots of comments, with as few lines of code as possible. This could be useful for the PL folks to take a look at, and for people who would want to contribute to Sequoia. - [x] `basics.py`: describe the Setting and Method APIs. - [ ] `concrete_example.py`: give an example of a "concrete" setting and method, e.g. Task-Incremental SL. - [ ] `multiple_settings.py`: Start from something more general than Task-Incremental Setting, and create a small hierarchy of SL settings. - [ ] Finish PR for ReplayV2 - [ ] Add brax support [Brax](https://www.github.com/google/brax) - [X] Make it PyTorch-Compatible - [ ] Adding non-stationarities to the envs - [ ] Batched environment with SB3 (SB-VectorEnv vs Gym VectorEnv) - [ ] add PackNet to hparam sweep - [ ] test it - [~] Update the README.md: - [ ] Add some images maybe? - [ ] Logo? - [ ] Look into SB3 VectorEnv compatibility - [ ] Design way to store hparam configs for each method - [X] Continual Mujoco bodisize modification - [ ]  ### Lucas - End goal: Something like CN-DPM on Offline RL (with multiple games) - [ ] Play around with d3rlpy - https://github.com/takuseno/d3rlpy - [ ] Look at Conservative Q-Learning and other Offline-RL algorithms (via code or papers) - https://sites.google.com/view/cql-offline-rl - [ ] Read about forward dynamics models, can they work like VAE in NDPM? ### Ryan - [ ] Add/Integrate PettingZoo to Sequoia - Benchmarking performance on SB3 to see how compares with sequoia - Trying to get simple runs with RLSetting to make sure performance is approx. equivalent - Plan is to next run CustomPPOMethod from sequoia on simple MARL set-up with pettingzoo, to get understanding of how library will hook in - Then think about how to structure the setting file once have a working implementation of sequoia method functioning within pettingzoo ## August 19th, 2021 ### Updates: - Look into Import Continual World CRL Algorithms as Methods? ### Fabrice - [X] Look at Continual World code - [ ] Finish PR for ReplayV2 - [ ] Add brax support [Brax](https://www.github.com/google/brax) - [ ] add PackNet to hparam sweep - [ ] test it - [~] Update the README.md: - [ ] Add some images maybe? - [ ] Logo? - [ ] Look into SB3 VectorEnv compatibility - [ ] Design way to store hparam configs for each method ### Lucas - [ ] Check out garage toolkit for MetaRL - [ ] Add PackNet CRL Example - [ ] test it ### Ryan - [ ] Add/Integrate PettingZoo to Sequoia - Dug into SB3 a bit more to understand how we approach continual RL settings and leverage SB3 to achieve this - Questions: - Do we add MARL under RLSetting? Do we add non-stationarity later? - Yes! - Why do we wrap all the SB3 algos, e.g. `A2CModel(A2C, OnPolicyModel)`? - So that we are able to "patch" any method by overriding them - So that we can create dataclasses for their hyper-parameters and avoid duplicating the constructor arguments across 6 different files. - What is the role of the `GymDataLoader`? Is it something that I would need to use when creating the MARL Setting? - The GymDataLoader is used so that methods that use PyTorch-Lightning can treat the environments as dataloaders. (It implements the gym.Env API as well as the DataLoader API from PyTorch.) - No, you don't need to worry about PyTorch-Lightning compatibility for now. I'll take care of that later. For now, just return whatever environments a Method for MARL would expect (ideally something close to the PettingZoo API). ### Massimo - [ ] ## August 12th, 2021 ## Updates - adding a potential collaborator (zeyuan, intern at MILA) - 1hour technical Meeting PyTorch Ligning folks coming soon (yay) ## Todos: - [ ] add LaMAML to hparam sweep - [ ] test it ### Fabrice - [ ] Finish PR for ReplayV2 - [ ] Look into using [Brax](https://www.github.com/google/brax) for RL - [ ] add PackNet to hparam sweep - [ ] test it - [~] Update the README.md: - [ ] Add some images maybe? - [ ] Logo? - [ ] Look into SB3 VectorEnv compatibility - [ ] Design way to store hparam configs for each method ### Massimo - [ ] Get familiar with SB3 - [ ] Add Hindsight Experience Replay to Sequoia ### Ryan - [ ] Add/Integrate PettingZoo to Sequoia - From last week: - Reviewed Stable Baselines (already integrated) and PettingZoo APIs in prep to integrate PettingZoo ## August 5th, 2021 ### Updates: - Brax compatibility is almost here! - PyTorch-Lightning folks are interested! (yay!) - Look into adding [CTrlBenchmark for CSL](https://github.com/facebookresearch/CTrLBenchmark)! ### Todos: Fabrice: - [X] Create an issue for PyTorch 1.9 compatibility - [X] Refactoring Replay method - [X] Write the tests - [ ] Make them all pass - [ ] Make a PR - [~] Add PackNet to the PL example - [ ] Create a notebook version? - [~] Update the README.md: - [X] Guide users more directly to the examples - [X] Remove extra stuff - [ ] Add some images maybe? - [ ] Logo? Lucas: - [~] PackNet: - [X] Create a fork of Sequoia - [X] Create a PR to add PackNet (single file) - [ ] (optional) Add PackNet Callback to the PL Example - (Fabrice: Actually, it might make more sense to do EWC (to match quick_demo_ewc.py) - RL: - Read some papers Ryan: - [~] CN-DPM: - [X] Refactor the configs yaml files into dataclasses - [X] Create PR - [X] Add tests - [~] Make tests pass - (Need to tweak the configs so the tests are quicker to run) - [X] Merge PR - [ ] Test it out on different datasets than mnist (dynamic input size?) - RL: - Read some papers: - Berkeley research (Sergei Levine and others) - NeurIPS Deep RL workshop: AVID - GAN for domain transfer between real human demonstrations and robot world - Train robot on generated demonstrations - Assign reward to intermediate steps? Massimo: - [X] Submit the Arxiv version - [X] Fix the little typos on first version - [ ] Get familiar with SB3 - [ ] Add Hindsight Experience Replay to Sequoia ## July 29th, 2021: Updates: - Chat with PyTorch-Lightning Flash maintainers - PyTorch-Lightning's `Callback` is an easy way to add a "plugin" to any algo! - Probably a good idea to retire this "auxiliary task" API. Todos: - [ ] Push it to Arxiv - [ ] Finish empirical section - [ ] SL (ewc) - [ ] Look into BRAX for massively parallel RL environments - [ ] Make SB3 methods work w/ batched envs - [ ] Refactor Replay (based on BaseMethod) ## July 22nd 2021: Updates: 1. New way to add methods to Sequoia! 2. CN-DPM is now available as a Method! 3. Integration of Mostafa's submission in the examples! New Issues: - [ ] Look into using [Brax](https://www.github.com/google/brax) for RL - [X] Add pytorch-lightning example ## Jul 7th ### Before Arxiv: - [ ] Holes in CSL study - [x] launch Experience_replay in Cifar10 - [ ] Holes in CRL study - [x] No online performance anywhere (except Monsterkong). - [X] Set `monitor_training_performance` to `True` by default in RL - [X] Relauch everything? - [ ] if so, start a new workspace and copy Monsterkong runs - (Not sure this is needed) - [x] Replace '0' with None in Wandb so the average shows a good average of the online performance - [ ] Half-Chettah - [X] ~~baseline not launched (maybe bc it's not continuous?)~~ (BaseMethod doesn't support continuous action spaces yet) - [x] ~~no sucessful DQN anywhere~~ Missing DQN runs (DQN doesn't support continuous action spaces): - [ ] CartPole - [ ] MonsterKong - [ ] no sucessful SAC runs in multi-task and incremental RL: - ![](https://i.imgur.com/YLxeGKW.png) - [x] MontainCar - ~~[ ] baseline method not launched (maybe bc it's not continuous?)~~ (same as above) - [ ] MonsterKong - [ ] base method not launched - [ ] DQN - [x] accept/reject updates in the overleaf - [ ] Refactor Replay (based on BaseMethod) ### After Arxiv: - [ ] Improved Command-Line API - [ ] Debugging MetaWorld: - [ ] CW10 / MT10 / CW20 only have one run per algo? - Q: Are some properties persisting between runs in an hparam sweep? (e.g. train_env?) - [ ] SAC - [ ] IncrementalRL doesn't have runs - [ ] Step limit doesn't seem to be working - Q: is `max_episode_steps` being set on the Setting when using a MetaWorld end? - Do all metaworld envs have the same episode length limit? (500?) - [ ] Add SAC Output Head to BaseMethod - [ ] Choose the best name for 'Model' below: ```python3 class BaseMethod: model: BaselineModel class Model(LightningModule): # <--- this ... class MultiHeadModelMixin(Model): ... class SelfSupervisedMixin(Model): ... class SemiSupervisedMixin(Model): ... class BaseModel( MultiHeadModelMixin, SelfSupervisedModelMixin, SemiSupervisedModelMixin ): ... ``` - [x] Add 'avalanche' prefix to all avalanche methods, not just conflicting ones. - [x] Same, but for SB3 (or all Methods who have a 'family' field) - [ ] Convert older runs in W&B: - [ ] Renamed settings - [ ] Renamed methods - [ ] [reproducibility](https://github.com/lebrice/Sequoia/projects/12#card-64649672) ## June 7th: Things to read: - [ ] BabyAI: GridWorld + text - [ ] Paper https://openreview.net/pdf?id=rJeXCo0cYX - [ ] Code # Massimo's Sequoia TODOs ### Before Arxiv - finish CSL analysis - maybe add a multiple setting figure, e.g. w/ baseMethod - finish CRL analysis - maybe add a multiple setting figure, e.g. w/ baseMethod