# [365.207] Practical Work in AI
###### 04.03.2021
<br/>
##### Ionelia Buzatu
###### k12008243@students.jku.at
<br/>
##### Supervisors:
###### Vihang Patil patil@ml.jku.at
###### Marius-Constantin Dinu marius-constantin.dinu@jku.at
###### \+ Collaboration with Manuel Del Verme delvermm@mila.quebec
<img style="float:" src=https://i.imgur.com/G65bSpT.png width=100>
<img style="float:" src=https://i.imgur.com/YtbsJKT.png width=100>
---
###### Please visit this link to visualize the animation in the slides https://hackmd.io/@92tLxRFMRF-iTbw8_IQU6Q/HkmpLUazO#/
---
### Motivation
* Getting experience with RL agents
* Move towards Epigenetic Systems with RL
---
### Overview:
1) Benchmarks
2) A plant simulator with continuous actions
3) Use agents from (1) on the plants environment
---
### 1. Benchmarks
---
|Environments | Models
|-|- |
|Pendulum-v0 <br/> Hopper-v2 <br/> HalfCheetah-v2 <br> Humanoid-v2 <br> HumanoidStandup-v2 <br> ReacherBulletEnv-v0| DDPG <br> TD3 <br> PPO2 <br> SAC <br> TRPO
---
### Benchmarks plotting
[https://wandb.ai/ionelia/rl-benchmarks](https://https://wandb.ai/ionelia/rl-benchmarks)
---
Pendulum

---
Hopper

---
Cheetah

---
Walker (Humonid)

---
Reacher

---
HumonidStandup

---
### Benchmarks rendering
---
||DDPG|
|-| -|
|Pendulum| 
|Humanoid & Hopper|
|HalfCheetah & HumanoidStandup| 
|Reacher |
---
| | PPO2 |
|------|--|
|Pendulum | 
|Humanoid & Hopper| 
|HalfCheetah & HumanoidStandup |  
|Reacher | 
---
| | TD3
| -- | -
|Pendulum| 
|Humanoid & Hopper|
|HalfCheetah & HumanoidStandup |  
---
|| SAC
|-| -
|Pendulum | 
|Humanoid & Hopper | 
|HalfCheetah & HumanoidStandup |  
|Reacher |
---
| | TRPO
|-|- |
|Pendulum| 
|Humanoid & Hopper| 
|HalfCheetah & HumanoidStandup|  
|Reacher |
---
### 2. Plant control
---
| Plant Control environment | |
| -------- | -------- |
| Initial state | 
| Success|
| Exploration failure|
---
Goal: Plant growth simulator, a policy to grow plants.
Reward: inverse euclidean distance
\begin{align}
R = \frac{1}{\sqrt{(x_b - x_t )^2 + (y_b - y_t )^2}}
\end{align}
Where $b$ is the closest tip to the marker $t$
---
## Some policies:

---
## Adding water
$$R = \frac{1}{d_{b,t}} -1$$
$$\cases{
-1 & \text{if the plant was watered}\\
R & else\\
}$$
10% water used to create a new branch
---
#### Growing multiple plants simultaneously
$R = min(r_1, r_2)$
| Easy | 
| - | -
| **Medium** |
|**Hard** | 
---
#### PPO2 continuous control training [plots on comet.ml](https://www.comet.ml/yasmeenvh/growspace2021/7cdba39926894314a87f3f523218de16?experiment-tab=chart&showOutliers=true&smoothing=0.585&transformY=smoothing&xAxis=step)

---
#### Ongoing, Mnist Challenge
shape the plant into a MNIST digit.

---
#### Future work
1) Root system
2) Plant genome/epigenome
3) Adapt reward for the above cases
---
**Can we model epigenetic systems in a RL continuous control to benefit from it ask questions such as:**
* Can we optimize the environment resources to make a plant grow?
* Can we use crispr cas9 technique where there is reward that encourages good changes?
---
References:
* [Openai mujoco environments](https://gym.openai.com/envs/#mujoco)
* [Stable baselines models](https://stable-baselines.readthedocs.io/en/master/guide/examples.html)
* [Wandb for plotting](https://wandb.ai)
* [Reproducibility of Benchmarked Deep Reinforcement Learning Tasks for Continuous Control](https://arxiv.org/pdf/1708.04133.pdf)
* [An Evolutionary Robotics Approach to the Control of Plant Growth and Motion](https://sci-hub.do/https://ieeexplore.ieee.org/document/7774383)
* [Autonomously shaping natural climbing plants: a bio-hybrid approach](https://royalsocietypublishing.org/doi/full/10.1098/rsos.180296)
---
### Thank you
{"metaMigratedAt":"2023-06-15T20:36:40.565Z","metaMigratedFrom":"YAML","title":"[365.207] Practical Work in AI","breaks":true,"contributors":"[{\"id\":\"f76b4bc5-114c-445f-a24d-bc3cfc8414e9\",\"add\":8029,\"del\":2830},{\"id\":\"c24dc5d4-41ff-41ab-b5ac-7a3f3df74495\",\"add\":879,\"del\":1049}]"}