To-Do DeepPlants

# To-Do DeepPlants ## Notes * Since switched to pixel system [1,84] the reward 1/distance has become very small. Example hierarchy general setting: ![](https://i.imgur.com/zEBvGA2.png) * Before 1/distance when environment was a range [0,1] we have rewards > 1 ## Meeting notes: ### to-do May 26: - [ ] Rainbow, complete by this week - [ ] Launch small tests while refactoring - [ ] implement comments - [ ] creative commons CC or MIT ### to-do May 19: - [ ] contextual bandits https://github.com/RonyAbecidan/NeuralTS - [ ] DEAP, CMA-ES specifically https://github.com/DEAP/deap (will not do?) - [x] fix branching assert for out of boundary branching - temporary fix, need to ask Manuel - [ ] look at atari hyperparamters for otherbaselines as starting point ### to-do May 12: - [x] change reward for control, hierarchy and fairness - [ ] change reward mnist: - [x] Sigmoid (launched on cluster, having issues) - [x] Log (tried but gave still negative values) - [x] tried (could be promising but very long to run) - [x] tried sqrt(x) - [ ] make mnist chllenge load directly from pytorch MNIST dataset and not load pictures. - [ ] rerun experiments: - [x] control ![](https://i.imgur.com/ytfM9l5.png) - [x] hierarchy ![](https://i.imgur.com/LgdHO0m.png) - [x] fairness![](https://i.imgur.com/luAXXKS.png) - [ ] mnist ### to-do May 5: - [ ] DO CONFIG FOR TESTING PEOPLE ### to-do April 28 - [x] Add appendix to explain hyperparameter - [x] IoU with and without discount % multiply on plant pixels - [x] Try bigger resolution only for human rendering ### to-do April 22 (meeting minutes) - [ ] Stable baseline to use all other baselines (i will ) - [x] Mean varation run time (pixel vs [0,1]) * Matrix: reset times: 0.000573272705078125 +/- 0.00014614290525266836 step times: 0.01752043294906616 +/- 0.012189358080591066 * Old version: reset times: 0.00019029617309570312 +/- 4.6077630014265054e-05 step times: 0.009691695785522461 +/- 0.008425047466218057 ![](https://i.imgur.com/5KPQDzi.png) where 1 = matrix 2 = old version - [x] Pick generalized setting of PPO for all Environments - [x] add plant pixel (will do in current branch YH) - [x] matrix implementation accross environments (started branch YH) ## "Timeline" ### Week April 19-26: - [x] Finish matrix implementation, merge into master - [x] Pick Hyperparameters for all environments - [x] Finalize paper transition into NeurIPS 2021 format ### Week April 26-3: *All experiments 3 times* Control : - [x] Control run experiments on easy case study PPO **(Launched)** - [x] Control run experiments on hard case study PPO **(Launched)** - [ ] Control run on standard setting PPO, stable-baselines, plot with oracle and random **(Launched PPO general, oracle and random)** Hierarchy: - [x] Hierarchy run experiments on easy case study PPO **(Launched)** - [x] Hierarchy run experiments on hard case study PPO **(Launched)** - [ ] Hierarchy run on standard setting PPO, stable-baselines, plot with oracle and random **(Launched PPO general, oracle and random)** Fairness: - [x] Fairness run experiments on target middle case study **(Launched)** - [x] Fairness run experiments on target above on case study **(Launched)** - [x] Fairness run experiments on easy plants case study **(Launched)** - [ ] Fairness run on standard setting PPO,stable-baselines, plot with oracle and random **(Launched PPO general and random, need to run oracle)** Mnist: **(Currently running)** - [ ] Mnist run on standard setting (MnistMix) PPO, stable-baslines, plot with oracle and random **(Launched PPO general, oracle and random)** - [x] Mnist compare difficulties of numbers 0-9, PPO **(Launched run*_mnist*)** - [x] boxplot and whiskers to see Mnist numbers and difficulties - [x] Curriculum ordering of numbers from easy to hard, compare to standard? (Really not sure we will pull this off but puttin this here) **(made first curriculum)** ### Week May 3-10: *Experiments from following week may still be on going* - [ ] Write up Result section with results from experiments **Control:** ![](https://i.imgur.com/II6i9zo.png) **Hierarchy:** ![](https://i.imgur.com/MGU6bcV.png) * Branches: ![](https://i.imgur.com/ra3Ude3.png) **Fairness:** ![](https://i.imgur.com/X6HHbZ0.png) **Mnist:** * comparison of digits * This results in an order of 3,6,2,1,4,5,7,8,9,0 * First curriculum is launched ``` if self.episode <= 2000: self.shape = self.path + '36' + '/' if 2000 <= self.episode <=4000: self.shape = self.path + '362' + '/' if 4000 <= self.episode <= 6000: self.shape = self.path + '3621' + '/' if 6000 <= self.episode <= 8000: self.shape = self.path + '36214' + '/' if 8000 <= self.episode <= 10000: self.shape = self.path + '362145' + '/' if 10000 <= self.episode <= 12000: self.shape = self.path + '3621457' + '/' if 12000 <= self.episode <= 14000: self.shape = self.path + '36214578' + '/' if 14000 <= self.episode <= 20000: self.shape = self.path + 'partymix' + '/' ``` ![](https://i.imgur.com/iCrDJ89.png) - [ ] Have growspace repo ready for sharing (May 7th) ### Week May 10-17: - [ ] If curriculum learning was not completed, do so in this week - [ ] If other baselines (not PPO) were not completed for first draft to so **Week May 17-24:** - [ ] Implement feedback from beta testers - [ ] .... **Week May 24-31:** ## Hyperparameter tuning ### Best hyperparams (3 seeds - w.r.t. Reward Mean) Final results: plots comparing all hyperparamters across all envs ![](https://i.imgur.com/oL9YDpv.png) ![](https://i.imgur.com/5gugyzq.png) #### Control (reward mean: 21.319, name:`best_sweep_control_Apr17_02-06-27846077`) lr = 0.006697 eps = 0.03068 gamma = 0.8964 use_gae = False gae_lambda = 0.3429 entropy_coef = 0.1316 value_loss_coef = 0.3638 max_grad_norm = 0.3406 num_steps = 4240 optimizer= "adam" ppo_epoch = 15 num_mini_batch = 25 clip_param = 0.3758 use_linear_lr_decay = True ### Hierarchy (reward mean : 18.588, name: `hierarchy333_Apr20_18-05-09847823)` lr = 0.02214 eps = 0.04762 gamma = 0.9452 use_gae = True gae_lambda = 0.6908 entropy_coef = 0.08532 value_loss_coef = 0.7231 max_grad_norm = 0.2814 num_steps = 2592 optimizer= "adam" ppo_epoch = 11 num_mini_batch = 65 clip_param = 0.211 use_linear_lr_decay = False ### Fairness (reward mean 3.837, name : `Fairness3seeds_Apr20_09-40-58847614`) lr = 0.02944 eps = 0.0444 gamma = 0.2065 use_gae = False gae_lambda =2-2 of 2 0.7383 entropy_coef = 0.2854 value_loss_coef = 0.2857 max_grad_norm = 0.1301 num_steps = 3480 optimizer= "adam" ppo_epoch = 6 num_mini_batch = 55 clip_param = 0.2582 use_linear_lr_decay = False ### Mnist (reward mean : 9.865, name : `SpotlightSweep_Apr22_05-06-18847760`) lr = 0.03955 eps = 0.03748 gamma = 0.9013 use_gae = True gae_lambda = 0.412 entropy_coef = 0.04617 value_loss_coef = 0.4661 max_grad_norm = 0.5232 num_steps = 2286 optimizer= "adam" ppo_epoch = 4 num_mini_batch = 31 clip_param = 0.08368 use_linear_lr_decay = True ### Control VS Continouous ![](https://i.imgur.com/HHNcCOs.png) lr = 0.02405 eps = 0.03392 gamma = 0.9325 use_gae = True gae_lambda = 0.7988 entropy_coef = 0.02783 value_loss_coef = 0.4377 max_grad_norm = 0.3353 num_steps = 3846 optimizer= "adam" ppo_epoch = 16 num_mini_batch = 27 clip_param = 0.09977 use_linear_lr_decay = True ### Best hyperparams (3 seeds - w.r.t. Episode_Reward) #### Control lr = 0.04747 eps = 0.003255 gamma = 0.2597 use_gae = False gae_lambda = 0.3391 entropy_coef = 0.3466 value_loss_coef = 0.3693 max_grad_norm = 0.1471 num_steps = 3463 optimizer= "adam" ppo_epoch = 15 num_mini_batch = 46 clip_param = 0.4327 use_linear_lr_decay = True ### Hierarchy lr = 0.06348 eps = 0.03238 gamma = 0.9805 use_gae = True gae_lambda = 0.7463 entropy_coef = 0.178 value_loss_coef = 0.563 max_grad_norm = 0.3398 num_steps = 3244 optimizer= "adam" ppo_epoch = 19 num_mini_batch = 32 clip_param = 0.08664 use_linear_lr_decay = False ### Fairness lr = 0.07827 eps = 0.04517 gamma = 0.1388 use_gae = True gae_lambda = 0.3265 entropy_coef = 0.308 value_loss_coef = 0.2915 max_grad_norm = 0.4387 num_steps = 3285 optimizer= "adam" ppo_epoch = 18 num_mini_batch = 23 clip_param =0.09442 use_linear_lr_decay = True ### Mnist lr = 0.05616 eps = 0.00107 gamma = 0.5172 use_gae = False gae_lambda = 0.8351 entropy_coef = 0.4723 value_loss_coef = 0.3531 max_grad_norm = 0.3382 num_steps = 4793 optimizer= "adam" ppo_epoch = 7 num_mini_batch = 23 clip_param = 0.1206 use_linear_lr_decay = False ### Control VS Continouous (the control run only) lr = 0.08721 eps = 0.02147 gamma = 0.3667 use_gae = True gae_lambda = 0.8944 entropy_coef = 0.3404 value_loss_coef = 0.6391 max_grad_norm = 0.3057 num_steps = 1858 optimizer= "adam" ppo_epoch = 15 num_mini_batch = 84 clip_param = 0.2914 use_linear_lr_decay = True ### Chosen Environment Parameters | Hyperparameters Growsapce | Control, Hierarchy & Fairness| Mnist | | -------- | -------- | -------- | -------- | |FIRST_BRANCH_HEIGHT| .24 | 0.05| 0.5 | |BRANCH_THICCNESS| 0.015 | 0.015| 0.05 | |BRANCH_LENGTH| 1/9 | 1/30| 1/5 | |MAX_BRANCHING| 10 | 1| 20 | |LIGHT_WIDTH| .25 | .1| 1| |LIGHT_DIF | 250 | 100| 400| ### Sweep.yaml for PPO hyperparameters | Parameter | previous setting | Min |Max | | -------- | -------- | -------- |-------- | | lr | 2.52-4 |0.003 |5e-6| | clip-param | 0.1 | Text |Text | |value-loss-coef | 0.5 | Text |Text | | num-processes | 1 | Text |Text | | num-steps | 2000 | Text |Text | | num-mini-batch | 4 | Text |Text | | log-interval | 1 | Text |Text | | entropy-coef | 0.01 | Text |Text | | use-gae | 0.01 | Text |Text | ### Things done I am thinking we can check the box when done - [x] Make different growspace environments a hyperparameter - [x] Run sweep on multi environments - [x] Make pposweep.yml file - [x] Pick paramaters that best generate a growing plant - [x] Run sweep on PPO hyperparameter for different reward structures control&hierarchy - [x] Run sweep on PPO hyperparameter for fairness - [X] Run sweep discrete versus continuous (sweep.yaml+pposweep.yaml) - [x] Run sweep on PPO hyperparameter for MnistMix - [X] add action plots for wandb log - [X] fix episode rewards - [X] plot for lighting displacement - [X] Plot continuous action space, how do we do this - [X] Mancuso email ## List of things (March) - ~~Add more digits for spolight challenge~~ - ~~Remove initial stem from the similarity score~~ - Adding curriculum for hierarchichal challenge and maybe to MNIST later, look here: https://lilianweng.github.io/lil-log/2020/01/29/curriculum-~~for-reinforcement-learning.html - ~~Confirm shading by tomorrow~~ - ~~Sweep with buddy system~~ - ~~Normalize reward over different digits tested on for MNIST challenge (look at how they did it for atari)~~ - Oracle MNIST - Oracle review for all - ~~Add digit 0 to MnistMix~~ ## Testing on Spotlight - ~~Run on larger amount of 1s for reset~~ - ~~Run on number 7s for every reset~~ - ~~alternate between 1 and 7s for every reset~~ - ~~Random digits for every reset~~ - ~~Look into curriculum based off of different tests mentioned above ~~ ## Add Configs for Buddy: | Hyperparameters Growsapce | Current Value| Min | Max | | -------- | -------- | -------- | -------- | |FIRST_BRANCH_HEIGHT| .24 | 0.05| 0.5 | |BRANCH_THICCNESS| 0.015 | 0.015| 0.05 | |BRANCH_LENGTH| 1/9 | 1/30| 1/5 | |MAX_BRANCHING| 10 | 1| 20 | |LIGHT_WIDTH| .25 | .1| 1| |LIGHT_DIF | 250 | 100| 400| ### Not sure if we should play around with these ones - LIGHT_DISPLACEMENT = .1 - LIGHT_W_INCREMENT = .1 - DEFAULT_RES = 84 - MIN_LIGHT_WIDTH = .1 - MAX_LIGHT_WIDTH = .5 # Miscellaneous ## Other researchers who could be interested * Heiko Hamann is Professor for Service Robotics at the University of Lübeck, Germany. Previous work include A robot to shape your natural plant * # Future of Growspace ## Future Baselines - Soft Actor-Critic - TRPO - https://github.com/hill-a/stable-baselines ## Future Work - Water Budget - Root system (same branching pattern) - DNA - Different plants? Could refactoring allow for different plant models --- <details><summary><mark>Default Sweep (better name here?)</mark></summary> program: main.py method: random metric: goal: maximize name: Episode_Reward parameters: ^lr:*MODIFIED* distribution: uniform min: 1.0e-5 max: 0.1 ^eps:*Leaving as is* distribution: uniform min: 1e-7 max: 0.05 ^gamma:* Leaving as is , values range from 0.18 - 96 * distribution: uniform min: 0.1 max: 0.99 ^use_gae:* Leave as is* distribution: categorical values: - True - False ^use_linear_lr_decay: *Leave as is* distribution: categorical values: - True - False ^gae_lambda:*MODIFIED* distribution: uniform min: 0.3 max: 0.99 ^entropy_coef:*MODIFIED , Do we want to try witout like Simone proposes (because we have a small action space* distribution: uniform min: 0.01 max: 0.5 ^value_loss_coef:*MODIFIED* distribution: uniform min: 0.25 max: 0.75 ^max_grad_norm: distribution: uniform min: 0.1 max: 0.9 ^num_steps:*MODIFIED* distribution: q_uniform min: 1000 max: 5000 ^ppo_epoch:*Leave as is* distribution: q_uniform min: 1 max: 20 ^num_mini_batch:*Leave as is* distribution: q_uniform min: 10 max: 100 ^clip_param:*MODIFIED* distribution: uniform min: 0.05 max: 0.5 ^seed: distribution: categorical values: [111,222,333] ^optimizer: distribution: categorical values: ["adam", "sgd"] ^momentum: distribution: uniform min: 0.95 max: 0.999 </details>

Syntax	Example	Reference
# Header	Header	基本排版
- Unordered List	Unordered List
1. Ordered List	Ordered List
- [ ] Todo List	Todo List
> Blockquote	Blockquote
Bold font	Bold font
Italics font	Italics font
~~Strikethrough~~	~~Strikethrough~~
19^th^	19^th
H~2~O	H₂O
++Inserted text++	Inserted text
==Marked text==	Marked text
[link text](https:// "title")	Link
![image alt](https:// "title")	Image
`Code`	`Code`	在筆記中貼入程式碼
```javascript var i = 0; ```	`var i = 0;`
:smile:		Emoji list
{%youtube youtube_id %}	Externals
$L^aT_eX$	L^aT_eX
:::info This is a alert area. :::	This is a alert area.