RSS21 Spotlight Video Script

# RSS21 Spotlight Video Script Hello, my name is Lucas Barcelos and I'm one of the authors of Dual Online Stein Variational Inference for Control and Dynamics. ###### tags: `motivation` To operate autonomously, robots need to be able to take complex decisions on varying environments. Often times these decisions do not have a clear optimal choice and multiple outcomes have to be taken into account. Consider for instance a race-car driver during a race. When avoiding collisions, there are usually different ways they can steer the car and they quickly have to opt for one over the others. Furthermore, they have to adjust their driving to changing terrain and weather conditions, such as if it starts to rain. To address these challenges we propose a new framework of Bayesian Model Predictive Control to perform online updates on both control policies and our uncertainty over model dynamics. ###### tags: background The MPC task can be formulated as a variational inference problem. In this setting, we are given a family of policy distributions and dynamical model. The policies are parameterised by a sequence of actions, and these policies, along with the system dynamics, induce a distribution over future actions and states. Our aim is to minimise the KL-divergence between this posterior distribution and a target distribution that minimises a task dependent cost functional. In Stein Variational MPC this minimisation is performed through a modified Stein Variational Gradient Decent algorithm. In SVGD, the posterior distribution at a given time is approximated by a set of particles and the optimisation is carried out by evolving these particles through a sequential flow. In each intermediate step the velocity of the particles is given by the resulting effect of forces trying to maximise the similarity between the target and proposed distribution and forces repelling particles close together. In SV-MPC, each particle now represents a *sequence* of actions and the functional gradient is computed by generating sampled rollouts. The sequential nature of the problem is captured by introducing a shift operator to bootstrap the solution of the subsequent timestep given the current posterior distribution. ###### tags: outline SV-MPC assumes that the dynamics model is known and stationary and relies on the closed-loop robustness of the controller to compensate for any localisation errors due to model mismatch. Moreover, and perhaps more importantly, they consider the future state predictions given by the dynamics model as exact point estimates which may undermine the performance under more severe model mismatch. In contrast, we propose a new method, DuSt-MPC, in which the dynamical model incorporates uncertainty through distributions over parameters. ###### tags: method This uncertainty needs to be propagated through the optimisation procedure and leads to probabilistic outcomes when sampling rollouts. We propose a modified policy update rule, which takes into account both policy and model samples. The resulting risk minimisation can be seen as optimising an ensemble of outcomes per policy. To close the loop, the distribution over the dynamics model parameters is updated at every timestep given the real data observed from the system. This posterior update can be performed using mapping particle filters or sequential SVGD. Since the dynamics of the plant are independent from the control actions, the two inference procedures can be factorised and solved separately. ###### tags: experiments To demonstrate the importance of considering model uncertainty we perform a set of simulated and robotic experiments. :::info For each task the hyperparameters were optimised over several iterations using Bayesian Optimisation. ::: The first task consisted in navigating a planar maze. The goal is to reach the target on the opposite side while avoiding collisions. Initially, the point-mass weight is known, but after 100 steps the mass increases by 50%. Because DuSt-MPC considers a distribution over mass and adjusts it with each new observation it can quickly adapt and avoid collisions, whereas the model mismatch leads SV-MPC to crash in several instances. The second task is to perform trajectory tracking on a skid-steering autonomous ground vehicle. As the robot is velocity controlled, the uncertain parameter in this instance is the inertial center of rotation. Similarly to the previous task, midway through the experiment we cause a change in the dynamical model by adding extra load to the AGV. Again we notice how DuSt-MPC is able to adjust to the new configuration and maintains a lower tracking error as compared to the baseline.