# [Pipeline of paper](https://arxiv.org/pdf/1912.01603.pdf)
- [name=作者:Jeff]
- [time=Sat, August 1, 2020 17:30]
## Outline
[TOC]
## Important keywords
1. **Continuous Control**
2. **Robotics arm** (FetchReach-v1)
3. **Goal based mission**
- Far from several scenarios in D2C or PlaNet papers
- Taking "Walker-walk" and "Ant" for example, the two agents learn from own body reward, instead of the Fetch series agents learning reward between arms and objects.
4. **Model-based Deep Reinforcement Learning**
- Build environment model
6. **Learn behaviors by latent**
- image input via autoencoder
7. **The name of 'Planning' or 'imagination'**
- via RNN, LSTM, or GRU
## Build environment
### Simulator - [MuJoCo](http://www.mujoco.org/)
1) Apply for the MuJoCo license and get a accout number from a mail
2) Download th sutiable [version](https://www.roboti.us/index.html)
3) Run the "getid" to register own PC. After sucessfull registration, an activation key will lock own PC.
:::warning
- Access to MuJoCo official document
```
export MJLIB_PATH=/home/{user_name}/.mujoco/mujoco200_linux/bin/libmujoco200.so
export MJKEY_PATH=/home/{user_name}/.mujoco/mujoco200_linux/bin/mjkey.txt
```
- Access to MuJoCo files
```
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/home/jeff/.mujoco/mujoco200_linux/bin
```
:::
### [OpenAI Gym](https://github.com/openai/gym)
```
pip install gym
```
### Python requirement toolkit
## Training
Using the authors'[the code](https://github.com/yusukeurakami/dreamer-pytorch) who has developed dream to control
### Prepocessing
- Set Reward strategies
- Select camera position in simulator
- actor noise
### Main loop (until converge)
- Initial hyperparameters
```
* Commom Adjusion
- action_repeat : 1 (do same action several times)
- action_noise : 0.15 (Add noise to make agent learn from real action instead of only learning from "images")
- batch_size : 50 (How many datas for each episode)
- chunk_size : 50 (slice the data from batch size)
- model_lr: 1e-3 (Model learning rate)
- actor_lr : 8e-5 (Actor model learning rate)
- value_lr : 8e-5 (Value model learning rate)
- adam_epsilon : 1e-7 (Adam optimizer epsilon value)
- grad_clip_norm : 100 (The rate of grad_clip_norm to avoid gradient explosion and vanishing)
- planning_horizon : 15 (How long does the rssm plan to imagine)
- discount : 0.99 (Planning horizon distance)
- free_nats : 3 (Free nats are applied to the normalization and overshooting (but not global) KL losses before averaging over elements in the distribution)
- bit_depth : 5 (5 bit depth mean that only 32 colors in an image)
- seed_episodes : 5 (precollect episodes experience)
* Some setting:
- max_episode_length : 1000 (How long does an episode have? In the "fetach", there are 50 longs in an episodes.)
- experience_size : 500000 (the storation of experience)
- cnn_activation_function : relu
- dense_activation_function : elu
- embedding_size : 1024 (train rssm model input dim)
- hidden_size : 200 (train through linear layers' dim)
- belief_size : 200 (Belief/hidden size)
- state_size : 30 (State/latent size)
- collect_interval: 100 (Collect interval)
- overshooting_distance : 50 (Latent overshooting distance/latent overshooting weight for t = 1)
- overshooting_kl_beta : 0 (Latent overshooting KL weight for t > 1 (0 to disable))
- overshooting_reward_scale : 0 (Latent overshooting reward prediction weight for t > 1 (0 to disable))
- global_kl_beta : 0 (Global KL weight (0 to disable))
- disclaim : 0.95 (discount rate to compute return)
- optimisation_iter : 10 (Planning optimisation iterations)
- candidates : 1000 (Candidate samples per iteration)
- top_candidates : 100 (Number of top candidates to fit)
- worldmodel_LogProbLoss : default = True (use LogProb loss for observation_model and reward_model training)
```
- Intial several models
- Collect seed episodes
- Learn from episodes
## Testing
## Result
## Related work
- The concept like dreamer
- [Dyna, an Integrated Architecture
for Learning, Planning, and Reacting
](https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.48.6005&rep=rep1&type=pdf)
- [MuJoCo](https://homes.cs.washington.edu/~todorov/papers/TodorovIROS12.pdf)
- Survey
- [A brief survey of deep reinforcement learning](https://arxiv.org/pdf/1708.05866.pdf)
- [Reinforcement Learning in Robotics: A Survey](https://www.ias.informatik.tu-darmstadt.de/uploads/Publications/Kober_IJRR_2013.pdf)
- [Survey of Robotic Manipulation Studies Intending Practical Applications in Real Environments: —Object Recognition, Soft Robot Hand, Challenge Program and Benchmarking](https://www.researchgate.net/publication/326401265_Survey_of_Robotic_Manipulation_Studies_Intending_Practical_Applications_in_Real_Environments_-Object_Recognition_Soft_Robot_Hand_Challenge_Program_and_Benchmarking-)
- Robotics
- [Reacher](https://gym.openai.com/envs/Reacher-v2/) (OpenAI Gym)
- [Fetch Robotics](https://fetchrobotics.com/)
- [Multi-Goal Reinforcement Learning: Challenging Robotics Environments and Request for Research](https://arxiv.org/pdf/1802.09464.pdf)
- [Asymmetric Actor Critic for Image-Based Robot Learning](https://arxiv.org/pdf/1710.06542.pdf)
- [TEMPORAL DIFFERENCE MODELS:MODEL-FREE DEEP RL FOR MODEL-BASED CONTROL](https://arxiv.org/pdf/1802.09081.pdf)
- [Pick and Place Without Geometric Object Models](https://arxiv.org/pdf/1707.05615.pdf)
- Deep reinforcement learning
1) Model-based DRL which learn from latent (Image-based)
- [PlaNet](https://arxiv.org/pdf/1811.04551.pdf)
- [I2A](https://arxiv.org/pdf/1707.06203.pdfs)
2) Model-based DRL which learn with planning
- [Muzero](https://arxiv.org/pdf/1911.08265.pdf)
3) Else model-based DRL
- [BREMEN](https://arxiv.org/pdf/2006.03647.pdf)
4) About reward (dense and sparse)
- Sparse Reward:
1) [HER](https://arxiv.org/pdf/1707.01495.pdf)
2) [SHER](https://arxiv.org/pdf/2002.02089.pdf)
3) [PlanGAN](https://arxiv.org/pdf/2006.00900.pdf)
5) [KL divergence in DRL](https://arxiv.org/pdf/1905.01240.pdf)
- Dynamics learning
- [Improving PILCO with Bayesian Neural Network Dynamics Models](http://mlg.eng.cam.ac.uk/yarin/PDFs/DeepPILCO.pdf)
- [Deep reinforcement learning in a handful of trials using probalistic Dynamics models](https://arxiv.org/pdf/1805.12114.pdf)
- Robotics with DRL
- Image-based model-based deep reinforcement learning
- [SOLAR](https://arxiv.org/pdf/1808.09105.pdf)
- [Deep Visual foresight for planning robot motion](https://arxiv.org/pdf/1610.00696.pdf)
- [Visual foresight: model-based deep reinforcement learning for vision-based robotic control](https://arxiv.org/pdf/1812.00568.pdf)
- [E2C](https://arxiv.org/pdf/1506.07365.pdf)
- [RCE](https://arxiv.org/pdf/1710.05373.pdf)
- Else
- [Model-Based Planning with discrete and continuous actions](https://arxiv.org/pdf/1705.07177.pdf)
- [Leveraging Deep Reinforcement Learning for Reaching Robotic Tasks](https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=8014805)
- [Neural Network Dynamics for Model-Based Deep Reinforcement Learning with Model-Free Fine-Tuning](https://arxiv.org/pdf/1708.02596.pdf)
- [Deep Reinforcement Learning for Vision-Based Robotic Grasping: A Simulated Comparative Evaluation of Off-Policy Methods](https://arxiv.org/pdf/1802.10264.pdf)
- [Review of Deep Reinforcement Learning for Robot Manipulation]( https://arxiv.org/pdf/1610.00633.pdf)
- [KEEP DOING WHAT WORKED: BEHAVIOR MODELLING PRIORS FOR OFFLINE REIN-FORCEMENT LEARNING](https://arxiv.org/pdf/2002.08396.pdf)