These notes are created from an implementation POV.
Main contribution:
Their main contribution is to learn long-horizon behaviors by propagating analytic value gradients through imagined trajectories. They show that this method gives empirically scalable results on complex control tasks.
Learning long-horizon behaviors by latent imagination.
Empirical performance for visual control.
Algorithm: