An introductory analysis on Imitation Learning Algorithms and Learning Behaviors

# An introductory analysis on Imitation Learning Algorithms and Learning Behaviors As robots and other intelligent agents move from simple environments and problems to more complex, unstructured settings, manually programming their behavior has become increasingly challenging and expensive. Often, it is easier for a teacher to demonstrate a desired behavior rather than attempt to manually engineer it. This process of learning from demonstrations, and the study of algorithms to do so, is called imitation learning. This work provides an introduction to imitation learning. It covers the underlying assumptions, approaches, and how they relate; the rich set of algorithms developed to tackle the problem; and advice on effective tools and implementation. ## Abstract Imitation Learning has been around for some time now, not being a new methodology in the fields of robotics. Imitation Learning allows us to teach robots to solve tasks with methods that humans have perfected over millenniums. However, these methods usually require a lot of human resources to collect the data in the real world. Because of this, there have been attempts to one-shot imitation learning from humans through the use of meta-learning, in order to decrease the necessary load on humans in the process of teaching. In this website we will be presenting some projects and papers on meta-learning and one shot learning in the fields of robotics. With this we hope to give the reader a better understanding of the current state of imitation learning in the field of intelligent robotics. ## An Introduction Programming autonomous behavior in machines and robots traditionally requires a specific set of skills and knowledge. However, human experts know how to demonstrate the desired task even if they do not know how to program the necessary behavior in a machine or robot. The purpose of imitation learning is to efficiently learn a desired behavior by imitating an expert’s behavior. The application of imitation learning is not limited to physical systems. It can be a powerful tool to design autonomous behavior in systems such as web sites, computer games, and mobile applications. Any system that requires autonomous behavior similar to human experts can benefit from imitation learning. However, imitation learning may be essential for robotics. It is now considered to be a key technology for applications such as manufacturing, elder care, and the service industry. These robots will be expected to work closely with humans in a dramatic shift from prior uses of robots. Powerful robotic manipulators are dangerous and have therefore been used mainly in constrained, predefined industrial applications; employees must undergo special training before working with them. This is changing due to recent advances in robotics from compute to the use of light, compliant, and safe robotic manipulators. They are ideal for applications where robots work alongside people, such as collaborating with human operators and reducing the physical workload of care givers. These applications require efficient, intuitive ways to teach robots the motions they need to perform from domain experts who may not possess special skills or knowledge about robotics. ## Imitation Learning Problem The goal of imitation learning is to learn a policy that reproduces the behavior of experts who demonstrate how to perform the desired task. Suppose that the behavior of the expert demonstrator (or the learner itself) can be observed as a trajectory τ = [φ0 , ..., φT ], which is a sequence of features φ. The features φ, which can be the state of the robotic system or any other measurements, can be chosen according to the given problem. Please note that the features φ do not have to be manually specified, and φ could be as general as simply pixels in raw images. Often, the demonstrations are recorded under different conditions, for example, grasping an object at different locations. We will refer to these task conditions as context vector s of the task which is stored together with the feature trajectories. The context s can contain any information relevant to the task, e.g., the initial state of the robotic system or positions of target objects. Note that, as the context describes the current task, it is typically fixed during task execution and the only dynamic aspects of the problem are the state features φt. Optionally, a reward signal r that the expert is trying to optimize is also available in some problem settings [Ross and Bagnell, 2014]. ## Sources ### Websites https://arxiv.org/pdf/1811.06711.pdf http://ai.stanford.edu/blog/ntp-ntg/ https://ai.googleblog.com/2020/09/imitation-learning-in-low-data-regime.html https://ieeexplore.ieee.org/document/9013081 https://ieeexplore.ieee.org/document/8990011 https://ieeexplore.ieee.org/document/9020095 https://www.tandfonline.com/doi/full/10.1080/01691864.2019.1698461 ### Videos https://www.youtube.com/watch?v=k6DUkuT5SjY https://www.youtube.com/watch?v=8uQkk-JFHtA https://www.youtube.com/watch?v=6rZTaboSY4k https://www.youtube.com/watch?v=V7CY68zH6ps