# [365.207] Practical Work in AI ###### 04.03.2021 <br/> ##### Ionelia Buzatu ###### k12008243@students.jku.at <br/> ##### Supervisors: ###### Vihang Patil patil@ml.jku.at ###### Marius-Constantin Dinu marius-constantin.dinu@jku.at ###### \+ Collaboration with Manuel Del Verme delvermm@mila.quebec <img style="float:" src=https://i.imgur.com/G65bSpT.png width=100> <img style="float:" src=https://i.imgur.com/YtbsJKT.png width=100> --- ###### Please visit this link to visualize the animation in the slides https://hackmd.io/@92tLxRFMRF-iTbw8_IQU6Q/HkmpLUazO#/ --- ### Motivation * Getting experience with RL agents * Move towards Epigenetic Systems with RL --- ### Overview: 1) Benchmarks 2) A plant simulator with continuous actions 3) Use agents from (1) on the plants environment --- ### 1. Benchmarks --- |Environments | Models |-|- | |Pendulum-v0 <br/> Hopper-v2 <br/> HalfCheetah-v2 <br> Humanoid-v2 <br> HumanoidStandup-v2 <br> ReacherBulletEnv-v0| DDPG <br> TD3 <br> PPO2 <br> SAC <br> TRPO --- ### Benchmarks plotting [https://wandb.ai/ionelia/rl-benchmarks](https://https://wandb.ai/ionelia/rl-benchmarks) --- Pendulum ![](https://i.imgur.com/8Uu04hl.png) --- Hopper ![](https://i.imgur.com/utRzMRL.png) --- Cheetah ![](https://i.imgur.com/TUYPWpA.png) --- Walker (Humonid) ![](https://i.imgur.com/zejqW3K.png) --- Reacher ![](https://i.imgur.com/MOIMzug.png) --- HumonidStandup ![](https://i.imgur.com/MPcHCFI.png) --- ### Benchmarks rendering --- ||DDPG| |-| -| |Pendulum| ![](https://i.imgur.com/dVcPNPi.gif =100x) |Humanoid & Hopper|![](https://i.imgur.com/1lqjaR8.gif =100x)![](https://i.imgur.com/wCBw7dK.gif =100x) |HalfCheetah & HumanoidStandup| ![](https://imgur.com/JsI9ZLy.gif =100x)![](https://imgur.com/dk3oQ9y.gif =100x) |Reacher |![](https://i.imgur.com/DkGIS9s.gif =200x) --- | | PPO2 | |------|--| |Pendulum | ![](https://i.imgur.com/65oIrs5.gif =100x) |Humanoid & Hopper| ![](https://i.imgur.com/U6lqjg0.gif =100x)![](https://imgur.com/ATc0E7d.gif =100x) |HalfCheetah & HumanoidStandup | ![](https://imgur.com/pvzM6GD.gif =100x) ![](https://imgur.com/Jc5oDJg.gif =100x) |Reacher | ![](https://i.imgur.com/wt2fA3o.gif =150x) --- | | TD3 | -- | - |Pendulum| ![](https://i.imgur.com/Lf8e58H.gif =100x) |Humanoid & Hopper|![](https://i.imgur.com/F5HQL5k.gif =100x)![](https://i.imgur.com/jYf2BUn.gif =100x) |HalfCheetah & HumanoidStandup | ![](https://imgur.com/WzcyrtT.gif =100x) ![](https://imgur.com/XK13tcZ.gif =100x) --- || SAC |-| - |Pendulum | ![](https://i.imgur.com/ybpJutJ.gif =100x) |Humanoid & Hopper |![](https://i.imgur.com/A1OcJ7R.gif =100x) ![](https://imgur.com/BqAG14R.gif =100x) |HalfCheetah & HumanoidStandup | ![](https://imgur.com/ckBQEdN.gif =100x) ![](https://imgur.com/7QUXIWW.gif =100x) |Reacher |![](https://i.imgur.com/GYD3emj.gif =200x) --- | | TRPO |-|- | |Pendulum| ![](https://i.imgur.com/i3ew9Bq.gif =100x) |Humanoid & Hopper|![](https://i.imgur.com/PLl9GH1.gif =100x) ![](https://imgur.com/HEBzPUc.gif =100x) |HalfCheetah & HumanoidStandup| ![](https://imgur.com/UKvkgq8.gif =100x) ![](https://imgur.com/mPBKHpd.gif =100x) |Reacher |![](https://i.imgur.com/ZpSWr2D.gif =200x) --- ### 2. Plant control --- | Plant Control environment | | | -------- | -------- | | Initial state | ![](https://i.imgur.com/oW7x3GV.png =100x) | Success|![](https://i.imgur.com/6bh81Lv.png =100x) | Exploration failure|![](https://i.imgur.com/pisyvc1.png =100x) --- Goal: Plant growth simulator, a policy to grow plants. Reward: inverse euclidean distance \begin{align} R = \frac{1}{\sqrt{(x_b - x_t )^2 + (y_b - y_t )^2}} \end{align} Where $b$ is the closest tip to the marker $t$ --- ## Some policies: ![](https://i.imgur.com/iC0FSUu.gif =300x) --- ## Adding water $$R = \frac{1}{d_{b,t}} -1$$ $$\cases{ -1 & \text{if the plant was watered}\\ R & else\\ }$$ 10% water used to create a new branch --- #### Growing multiple plants simultaneously $R = min(r_1, r_2)$ | Easy | ![](https://i.imgur.com/dGmcpn2.gif) | - | - | **Medium** |![](https://i.imgur.com/neIgR6g.gif) |**Hard** | ![](https://i.imgur.com/yuhTASH.gif) --- #### PPO2 continuous control training [plots on comet.ml](https://www.comet.ml/yasmeenvh/growspace2021/7cdba39926894314a87f3f523218de16?experiment-tab=chart&showOutliers=true&smoothing=0.585&transformY=smoothing&xAxis=step) ![](https://i.imgur.com/5TXr1o9.png) --- #### Ongoing, Mnist Challenge shape the plant into a MNIST digit. ![](https://i.imgur.com/B4y5rFQ.png =200x) --- #### Future work 1) Root system 2) Plant genome/epigenome 3) Adapt reward for the above cases --- **Can we model epigenetic systems in a RL continuous control to benefit from it ask questions such as:** * Can we optimize the environment resources to make a plant grow? * Can we use crispr cas9 technique where there is reward that encourages good changes? --- References: * [Openai mujoco environments](https://gym.openai.com/envs/#mujoco) * [Stable baselines models](https://stable-baselines.readthedocs.io/en/master/guide/examples.html) * [Wandb for plotting](https://wandb.ai) * [Reproducibility of Benchmarked Deep Reinforcement Learning Tasks for Continuous Control](https://arxiv.org/pdf/1708.04133.pdf) * [An Evolutionary Robotics Approach to the Control of Plant Growth and Motion](https://sci-hub.do/https://ieeexplore.ieee.org/document/7774383) * [Autonomously shaping natural climbing plants: a bio-hybrid approach](https://royalsocietypublishing.org/doi/full/10.1098/rsos.180296) --- ### Thank you
{"metaMigratedAt":"2023-06-15T20:36:40.565Z","metaMigratedFrom":"YAML","title":"[365.207] Practical Work in AI","breaks":true,"contributors":"[{\"id\":\"f76b4bc5-114c-445f-a24d-bc3cfc8414e9\",\"add\":8029,\"del\":2830},{\"id\":\"c24dc5d4-41ff-41ab-b5ac-7a3f3df74495\",\"add\":879,\"del\":1049}]"}
    278 views