# To-Do DeepPlants
## Notes
* Since switched to pixel system [1,84] the reward 1/distance has become very small. Example hierarchy general setting:

* Before 1/distance when environment was a range [0,1] we have rewards > 1
## Meeting notes:
### to-do May 26:
- [ ] Rainbow, complete by this week
- [ ] Launch small tests while refactoring
- [ ] implement comments
- [ ] creative commons CC or MIT
### to-do May 19:
- [ ] contextual bandits https://github.com/RonyAbecidan/NeuralTS
- [ ] DEAP, CMA-ES specifically https://github.com/DEAP/deap (will not do?)
- [x] fix branching assert for out of boundary branching - temporary fix, need to ask Manuel
- [ ] look at atari hyperparamters for otherbaselines as starting point
### to-do May 12:
- [x] change reward for control, hierarchy and fairness
- [ ] change reward mnist:
- [x] Sigmoid (launched on cluster, having issues)
- [x] Log (tried but gave still negative values)
- [x] tried (could be promising but very long to run)
- [x] tried sqrt(x)
- [ ] make mnist chllenge load directly from pytorch MNIST dataset and not load pictures.
- [ ] rerun experiments:
- [x] control 
- [x] hierarchy 
- [x] fairness
- [ ] mnist
### to-do May 5:
- [ ] DO CONFIG FOR TESTING PEOPLE
### to-do April 28
- [x] Add appendix to explain hyperparameter
- [x] IoU with and without discount % multiply on plant pixels
- [x] Try bigger resolution only for human rendering
### to-do April 22 (meeting minutes)
- [ ] Stable baseline to use all other baselines (i will )
- [x] Mean varation run time (pixel vs [0,1])
* Matrix:
reset times: 0.000573272705078125 +/- 0.00014614290525266836
step times: 0.01752043294906616 +/- 0.012189358080591066
* Old version:
reset times: 0.00019029617309570312 +/- 4.6077630014265054e-05
step times: 0.009691695785522461 +/- 0.008425047466218057

where 1 = matrix 2 = old version
- [x] Pick generalized setting of PPO for all Environments
- [x] add plant pixel (will do in current branch YH)
- [x] matrix implementation accross environments (started branch YH)
## "Timeline"
### Week April 19-26:
- [x] Finish matrix implementation, merge into master
- [x] Pick Hyperparameters for all environments
- [x] Finalize paper transition into NeurIPS 2021 format
### Week April 26-3:
*All experiments 3 times*
Control :
- [x] Control run experiments on easy case study PPO **(Launched)**
- [x] Control run experiments on hard case study PPO **(Launched)**
- [ ] Control run on standard setting PPO, stable-baselines, plot with oracle and random **(Launched PPO general, oracle and random)**
Hierarchy:
- [x] Hierarchy run experiments on easy case study PPO **(Launched)**
- [x] Hierarchy run experiments on hard case study PPO **(Launched)**
- [ ] Hierarchy run on standard setting PPO, stable-baselines, plot with oracle and random **(Launched PPO general, oracle and random)**
Fairness:
- [x] Fairness run experiments on target middle case study **(Launched)**
- [x] Fairness run experiments on target above on case study **(Launched)**
- [x] Fairness run experiments on easy plants case study **(Launched)**
- [ ] Fairness run on standard setting PPO,stable-baselines, plot with oracle and random **(Launched PPO general and random, need to run oracle)**
Mnist: **(Currently running)**
- [ ] Mnist run on standard setting (MnistMix) PPO, stable-baslines, plot with oracle and random **(Launched PPO general, oracle and random)**
- [x] Mnist compare difficulties of numbers 0-9, PPO **(Launched run*_mnist*)**
- [x] boxplot and whiskers to see Mnist numbers and difficulties
- [x] Curriculum ordering of numbers from easy to hard, compare to standard? (Really not sure we will pull this off but puttin this here)
**(made first curriculum)**
### Week May 3-10:
*Experiments from following week may still be on going*
- [ ] Write up Result section with results from experiments
**Control:**

**Hierarchy:**

* Branches:

**Fairness:**

**Mnist:**
* comparison of digits
* This results in an order of 3,6,2,1,4,5,7,8,9,0
* First curriculum is launched
```
if self.episode <= 2000:
self.shape = self.path + '36' + '/'
if 2000 <= self.episode <=4000:
self.shape = self.path + '362' + '/'
if 4000 <= self.episode <= 6000:
self.shape = self.path + '3621' + '/'
if 6000 <= self.episode <= 8000:
self.shape = self.path + '36214' + '/'
if 8000 <= self.episode <= 10000:
self.shape = self.path + '362145' + '/'
if 10000 <= self.episode <= 12000:
self.shape = self.path + '3621457' + '/'
if 12000 <= self.episode <= 14000:
self.shape = self.path + '36214578' + '/'
if 14000 <= self.episode <= 20000:
self.shape = self.path + 'partymix' + '/'
```

- [ ] Have growspace repo ready for sharing (May 7th)
### Week May 10-17:
- [ ] If curriculum learning was not completed, do so in this week
- [ ] If other baselines (not PPO) were not completed for first draft to so
**Week May 17-24:**
- [ ] Implement feedback from beta testers
- [ ] ....
**Week May 24-31:**
## Hyperparameter tuning
### Best hyperparams (3 seeds - w.r.t. Reward Mean)
Final results: plots comparing all hyperparamters across all envs


#### Control (reward mean: 21.319, name:`best_sweep_control_Apr17_02-06-27846077`)
lr = 0.006697
eps = 0.03068
gamma = 0.8964
use_gae = False
gae_lambda = 0.3429
entropy_coef = 0.1316
value_loss_coef = 0.3638
max_grad_norm = 0.3406
num_steps = 4240
optimizer= "adam"
ppo_epoch = 15
num_mini_batch = 25
clip_param = 0.3758
use_linear_lr_decay = True
### Hierarchy (reward mean : 18.588, name: `hierarchy333_Apr20_18-05-09847823)`
lr = 0.02214
eps = 0.04762
gamma = 0.9452
use_gae = True
gae_lambda = 0.6908
entropy_coef = 0.08532
value_loss_coef = 0.7231
max_grad_norm = 0.2814
num_steps = 2592
optimizer= "adam"
ppo_epoch = 11
num_mini_batch = 65
clip_param = 0.211
use_linear_lr_decay = False
### Fairness (reward mean 3.837, name : `Fairness3seeds_Apr20_09-40-58847614`)
lr = 0.02944
eps = 0.0444
gamma = 0.2065
use_gae = False
gae_lambda =2-2 of 2


0.7383
entropy_coef = 0.2854
value_loss_coef = 0.2857
max_grad_norm = 0.1301
num_steps = 3480
optimizer= "adam"
ppo_epoch = 6
num_mini_batch = 55
clip_param = 0.2582
use_linear_lr_decay = False
### Mnist (reward mean : 9.865, name : `SpotlightSweep_Apr22_05-06-18847760`)
lr = 0.03955
eps = 0.03748
gamma = 0.9013
use_gae = True
gae_lambda = 0.412
entropy_coef = 0.04617
value_loss_coef = 0.4661
max_grad_norm = 0.5232
num_steps = 2286
optimizer= "adam"
ppo_epoch = 4
num_mini_batch = 31
clip_param = 0.08368
use_linear_lr_decay = True
### Control VS Continouous

lr = 0.02405
eps = 0.03392
gamma = 0.9325
use_gae = True
gae_lambda = 0.7988
entropy_coef = 0.02783
value_loss_coef = 0.4377
max_grad_norm = 0.3353
num_steps = 3846
optimizer= "adam"
ppo_epoch = 16
num_mini_batch = 27
clip_param = 0.09977
use_linear_lr_decay = True
### Best hyperparams (3 seeds - w.r.t. Episode_Reward)
#### Control
lr = 0.04747
eps = 0.003255
gamma = 0.2597
use_gae = False
gae_lambda = 0.3391
entropy_coef = 0.3466
value_loss_coef = 0.3693
max_grad_norm = 0.1471
num_steps = 3463
optimizer= "adam"
ppo_epoch = 15
num_mini_batch = 46
clip_param = 0.4327
use_linear_lr_decay = True
### Hierarchy
lr = 0.06348
eps = 0.03238
gamma = 0.9805
use_gae = True
gae_lambda = 0.7463
entropy_coef = 0.178
value_loss_coef = 0.563
max_grad_norm = 0.3398
num_steps = 3244
optimizer= "adam"
ppo_epoch = 19
num_mini_batch = 32
clip_param = 0.08664
use_linear_lr_decay = False
### Fairness
lr = 0.07827
eps = 0.04517
gamma = 0.1388
use_gae = True
gae_lambda = 0.3265
entropy_coef = 0.308
value_loss_coef = 0.2915
max_grad_norm = 0.4387
num_steps = 3285
optimizer= "adam"
ppo_epoch = 18
num_mini_batch = 23
clip_param =0.09442
use_linear_lr_decay = True
### Mnist
lr = 0.05616
eps = 0.00107
gamma = 0.5172
use_gae = False
gae_lambda = 0.8351
entropy_coef = 0.4723
value_loss_coef = 0.3531
max_grad_norm = 0.3382
num_steps = 4793
optimizer= "adam"
ppo_epoch = 7
num_mini_batch = 23
clip_param = 0.1206
use_linear_lr_decay = False
### Control VS Continouous (the control run only)
lr = 0.08721
eps = 0.02147
gamma = 0.3667
use_gae = True
gae_lambda = 0.8944
entropy_coef = 0.3404
value_loss_coef = 0.6391
max_grad_norm = 0.3057
num_steps = 1858
optimizer= "adam"
ppo_epoch = 15
num_mini_batch = 84
clip_param = 0.2914
use_linear_lr_decay = True
### Chosen Environment Parameters
| Hyperparameters Growsapce | Control, Hierarchy & Fairness| Mnist |
| -------- | -------- | -------- | -------- |
|FIRST_BRANCH_HEIGHT| .24 | 0.05| 0.5 |
|BRANCH_THICCNESS| 0.015 | 0.015| 0.05 |
|BRANCH_LENGTH| 1/9 | 1/30| 1/5 |
|MAX_BRANCHING| 10 | 1| 20 |
|LIGHT_WIDTH| .25 | .1| 1|
|LIGHT_DIF | 250 | 100| 400|
### Sweep.yaml for PPO hyperparameters
| Parameter | previous setting | Min |Max |
| -------- | -------- | -------- |-------- |
| lr | 2.52-4 |0.003 |5e-6|
| clip-param | 0.1 | Text |Text |
|value-loss-coef | 0.5 | Text |Text |
| num-processes | 1 | Text |Text |
| num-steps | 2000 | Text |Text |
| num-mini-batch | 4 | Text |Text |
| log-interval | 1 | Text |Text |
| entropy-coef | 0.01 | Text |Text |
| use-gae | 0.01 | Text |Text |
### Things done
I am thinking we can check the box when done
- [x] Make different growspace environments a hyperparameter
- [x] Run sweep on multi environments
- [x] Make pposweep.yml file
- [x] Pick paramaters that best generate a growing plant
- [x] Run sweep on PPO hyperparameter for different reward structures control&hierarchy
- [x] Run sweep on PPO hyperparameter for fairness
- [X] Run sweep discrete versus continuous (sweep.yaml+pposweep.yaml)
- [x] Run sweep on PPO hyperparameter for MnistMix
- [X] add action plots for wandb log
- [X] fix episode rewards
- [X] plot for lighting displacement
- [X] Plot continuous action space, how do we do this
- [X] Mancuso email
## List of things (March)
- ~~Add more digits for spolight challenge~~
- ~~Remove initial stem from the similarity score~~
- Adding curriculum for hierarchichal challenge and maybe to MNIST later, look here:
https://lilianweng.github.io/lil-log/2020/01/29/curriculum-~~for-reinforcement-learning.html
- ~~Confirm shading by tomorrow~~
- ~~Sweep with buddy system~~
- ~~Normalize reward over different digits tested on for MNIST challenge (look at how they did it for atari)~~
- Oracle MNIST
- Oracle review for all
- ~~Add digit 0 to MnistMix~~
## Testing on Spotlight
- ~~Run on larger amount of 1s for reset~~
- ~~Run on number 7s for every reset~~
- ~~alternate between 1 and 7s for every reset~~
- ~~Random digits for every reset~~
- ~~Look into curriculum based off of different tests mentioned above ~~
## Add Configs for Buddy:
| Hyperparameters Growsapce | Current Value| Min | Max |
| -------- | -------- | -------- | -------- |
|FIRST_BRANCH_HEIGHT| .24 | 0.05| 0.5 |
|BRANCH_THICCNESS| 0.015 | 0.015| 0.05 |
|BRANCH_LENGTH| 1/9 | 1/30| 1/5 |
|MAX_BRANCHING| 10 | 1| 20 |
|LIGHT_WIDTH| .25 | .1| 1|
|LIGHT_DIF | 250 | 100| 400|
### Not sure if we should play around with these ones
- LIGHT_DISPLACEMENT = .1
- LIGHT_W_INCREMENT = .1
- DEFAULT_RES = 84
- MIN_LIGHT_WIDTH = .1
- MAX_LIGHT_WIDTH = .5
# Miscellaneous
## Other researchers who could be interested
* Heiko Hamann is Professor for Service Robotics at the University of Lübeck, Germany. Previous work include A robot to shape your natural plant
*
# Future of Growspace
## Future Baselines
- Soft Actor-Critic
- TRPO
- https://github.com/hill-a/stable-baselines
## Future Work
- Water Budget
- Root system (same branching pattern)
- DNA
- Different plants? Could refactoring allow for different plant models
---
<details><summary><mark>Default Sweep (better name here?)</mark></summary>
program: main.py
method: random
metric:
goal: maximize
name: Episode_Reward
parameters:
^lr:*MODIFIED*
distribution: uniform
min: 1.0e-5
max: 0.1
^eps:*Leaving as is*
distribution: uniform
min: 1e-7
max: 0.05
^gamma:* Leaving as is , values range from 0.18 - 96 *
distribution: uniform
min: 0.1
max: 0.99
^use_gae:* Leave as is*
distribution: categorical
values:
- True
- False
^use_linear_lr_decay: *Leave as is*
distribution: categorical
values:
- True
- False
^gae_lambda:*MODIFIED*
distribution: uniform
min: 0.3
max: 0.99
^entropy_coef:*MODIFIED , Do we want to try witout like Simone proposes (because we have a small action space*
distribution: uniform
min: 0.01
max: 0.5
^value_loss_coef:*MODIFIED*
distribution: uniform
min: 0.25
max: 0.75
^max_grad_norm:
distribution: uniform
min: 0.1
max: 0.9
^num_steps:*MODIFIED*
distribution: q_uniform
min: 1000
max: 5000
^ppo_epoch:*Leave as is*
distribution: q_uniform
min: 1
max: 20
^num_mini_batch:*Leave as is*
distribution: q_uniform
min: 10
max: 100
^clip_param:*MODIFIED*
distribution: uniform
min: 0.05
max: 0.5
^seed:
distribution: categorical
values: [111,222,333]
^optimizer:
distribution: categorical
values: ["adam", "sgd"]
^momentum:
distribution: uniform
min: 0.95
max: 0.999
</details>