# Project: Model-based RL
Goal of the project: to implement and benchmark several models, for model-based RL.
### Possible models
- GP regression
- [Variational GP-SSMs](http://papers.nips.cc/paper/5375-variational-gaussian-process-state-space-models) by Roger Frigola and others
- [NN Dynamics](https://arxiv.org/abs/1708.02596) by A. Nagabandi and others
- [Neural model in "Deep RL in a handful of trials"](http://papers.nips.cc/paper/5375-variational-gaussian-process-state-space-models), by Kurtland Chua and others
- Recurrent NN model in [World Models](https://arxiv.org/abs/1803.10122) by David Ha & Jürgen Schmidhuber
- [Neural Ordinary Differential Equations](https://arxiv.org/abs/1806.07366) by Tian Qi Chen and others
### Possible benchmarks
Even if the model is supposed to abstract away the action, in practice I've seen that following a similar _policy_ that the model was trained with is useful. Thus we should try these metrics with and without the policy that was used to generate the data.
- Prediction-based metrics
- Likelihood of trajectory
- Root mean squared error (RMSE) of future states
- Likelihood of predicted reward
- RMSE of predicted reward
- Performance-based metrics
- Total reward achieved (in combination with policy optimisation)
- "Data-efficiency", progression or number of episodes until reaching final reward
## Possible implementation starting point
My implementation of PILCO in Python, [OpenAI Gym](https://gym.openai.com/), [Bullet](https://docs.google.com/document/d/10sXEhzFRSnvFcl3XxNGhnD4N2SedqwdAvK3dsihxVUA/edit), [TensorFlow](https://tensorflow.org/) and [GPFlow](https://github.com/GPflow/GPflow). Features:
- Wrapped Gym environments (with alternative, Gaussian-integrable, cost function)
- Gaussian distribution class
- Facilities for moment-matching
- Decently tested
- Uses Bullet (free but slightly worse) instead of MuJoCo