Ph.D. Research Statement

# Ph.D. Research Statement ## Introduction Artificial Intelligence is a rapidly growing field, and my interest is to develop learning algorithms that transfer across different domains, generalize better, and overcome the need for massive manually-labeled datasets. Furthermore, I believe that many of the problems we are facing in autonomous agents can only be solved by joint efforts of Deep reinforcement learning and Continual learning. ## Background ### Undergraduate I previously completed my bachelor's in electrical engineering at BITS Pilani (India), where I ranked among the top 10% of students. In parallel, I contributed to the research work of Prof. Prasanta Kumar Ghosh in areas of multimodel signal processing and representation. As a result, I have 8 publications. ### Postgraduate Working alongside people with expertise, in different projects and internships, provided me with valuable experience in diverse areas of Machine Learning. To discover my precise area of interest and deepen my knowledge in the field of artificial intelligence, I started pursuing M.Sc at UdeM (MILA) specializing in Machine Learning, where I hold a GPA of 4 on 4.3. Here I had an opportunity to work on a variety of projects – mentioned below; that formed a core for my future research. 1. **Learning Legged Locomotion project for the course Robot Learning**: The goal of the project is to train a locomotion policy for a quadruped robot in simulation and use transfer techniques to deploy it on a real legged robot. We want to achieve this by reproducing the results reported in the paper Learning to Walk in Minutes Using Massively Parallel Deep Reinforcement Learning Rudin et al. 2022. Additionally, we also propose an Isaac Gym compatible model of the Stanford Pupper as well as the Unitree Go1. Our work also contains an ablation study of reward terms and a study of different environment parameters to understand the learning process better. 2. **Imitation learning for Autonomous vehicles**: My interest in Reinforcement Learning applied to Computer vision was first sparked through Autonomous Vehicles coursework at UdeM. This class is a collaborative learning experience about robotics with a subtle interaction between many hardware and software concepts. In this course, I focused on a project for the autonomous navigation of a robot (duckiebot) on road (duckietown) using the Imitation learning technique. I took this as an opportunity to explore recent works in Deep RL techniques that can be applied to my problem statement 1) ThriftyDagger (on-policy), DART (off-policy), End to End condition imitation learning. 3. **Uncertainty Estimation in Dataset (Internship project at Nuance)**: Identification of the erroneous tagged samples using uncertainty estimation techniques. The previous works such as Characterizing Structural Regularities of Labeled Data in Overparameterized Models (jiang et al 2020), Aleatoric and epistemic uncertainty in machine learning. Erroneous samples were the ones with high uncertainty, hence we formulated them as identification of inconsistent samples. For this problem reducing epistemic uncertainty was very difficult given that the dataset was limited, but reducing Aleatoric uncertainty contributed significantly to the results. Where I tried estimating the confidence relevant to the prediction. This was based on the proxy idea presented by jiang et al 2020. Currently, this model is deployed to the QA checker for identifying inconsistently tagged samples. 4. **Small memory footprint transformers for ASR for Low resource Datasets**: As a part of the project course, I worked on an open-source project in Speechbrain. The project involved experimenting with different transformers, especially their attention layer to reduce the memory footprint. This idea was very similar to Knowledge distillation, where we transfer the knowledge from the cumbersome model to a small model that is more suitable for deployment. These experiments were carried out for Speech Recognition tasks with low resources (TIMIT) corpus. This work will enable efficient transformer models for embedded speech applications. ### Ubisoft I joined Ubisoft La-Forge as an R&D Developer under Dr.Joshua Romoff, where I worked on Smart-Timeslice: Learning Human-like Controllable Bots from traces using Offline Reinforcement Learning. The goal of the project was to reduce the inference time of the new agent without compromising the performance of the expert policy. We did imitation learning with an additional Q-function loss to predict n-steps of actions from an observation. This offline RL approach was successful enough to produce a 2-3 times speedup. Furthermore, we are focusing on dynamic feature selection and computation cost constraints. ## Plan ### **Pre-PhD** Before my Ph.D starts, I plan on building up my knowledge of the following topics: Given that I have prior Linear Algebra, Machine Learning, Reinforcement Learning, Deep learningknowledge in these subjects, I would be moving to advanced topics, specific to my Ph.D. research goals. The plan is to work on the following topics. ``` * Offline Reinforcement Learning * Continual Learning ``` This part involves a literature survey of the recent paper and going through the implementations. Ultimately, writing down a summary of the paper with pros and cons. I believe that the literature survey acts as a good source of ideas. It gives knowledge about active research, and lastly, it brings out the critique nature. This literature survey will help me build a clear plan for the 1st year MITACS Accelerate Fellowship program. Planning before the start of the Ph.D. will help me expedite the Pre Doctoral Thesis phase. During this period, I would familiarize myself with skills and tools that can accelerate my research. For example, in offline reinforcement learning, the validation is done using d4rl dataset. ### Ph.D. Proposal Programming good bots in video games is still an open challenge, it is very time-consuming and limited to behaviors that we can code, resulting in unrealistic behaviors that decrease the quality of the games. As a solution, reinforcement learning seems to be the natural candidate to automatically discover efficient bots. Algorithms in this domain have recently achieved strong performance in games (starcraft, chess, go, …) and are natural candidates for bots. However, the fact that reinforcement learning algorithms provide a fundamentally online learning paradigm is one of the biggest obstacles to their widespread adoption. In many settings, online interactions are impractical, either because data collection is expensive (e.g., in robotics, or healthcare) or dangerous (e.g., in autonomous driving, or healthcare). Whereas, in domains where online interaction is feasible, we might still prefer to utilize previously collected data. For example, in the game industry, software projects extend over several years, typical AAA game 3 to 5 years, and the game keeps evolving. Hence, efficient learning can be done using historical data rather than online interaction. To make the game development process easier for the developers, tools are made to help with, for instance, the artistic creation process or code integration. Those tools are based on Reinforcement learning (RL) models and learning processes, trained on historical data. While such RL models might work accurately right after training, their predictions might lose relevance as time goes by, since the project's development objectives change over time, becoming progressively different from those at the time of model training. The phenomena of models and dataset aging are referred to as concept drift in the literature[1]. However shift may also happen when the task change. In RL for example an agent may have to solve a new task. Hence, understanding the progressive game environment would allow game tool creators to maximize the performance of the tools over a long period of time.   We propose to address this problem, by exploiting game traces captured from expert data. We plan on creating a few hundred maps inside SmartNav environment [2] in Unity [3] and Godot [4], by creating a bunch of different levels and collect expert trajectories, to understand the progressive game environment. Being inspired by some of the large-scale language models such as GATO, and Decision Transformer, we would like to apply a similar approach towards developing a single agent that will generalize on diverse tasks such as new maps, and different gaming levels without deviating from the expert policy. It will be related to imitation policies guided by capture the expert data. During my Ph.D., I would like to devise a generalist agent that is useful for video games development.  ### Application in Robotics: Robots have to learn to adapt and interact with their environment using a continuous stream of observations. Robots deployed in real-world environments will face new tasks and challenges over time, requiring capabilities that cannot be fully anticipated at the beginning. These robots need to learn continually, which implies that they should be able to acquire new capabilities without forgetting the previously learned ones. Furthermore, a continual learning robot should be able to do this without the need to store and retrain on the training data of all the previously learned skills. So during my Ph.D. I would like to explore the parallel connection between continual learning in video games and real-robots.   ## Conclusion To this end, I strongly believe that a Ph.D. in Computer science from this university would equip me with the tools necessary to be an independent researcher and contribute to this highly promising research.  ## Reference 1. Lesort, Timothée, et al. "Continual learning for robotics: Definition, framework, learning strategies, opportunities and challenges." Information fusion 58 (2020) 2. Alonso, Eloi, et al. "Deep reinforcement learning for navigation in aaa video games." arXiv preprint arXiv:2011.04764 (2020). 3. Haas, John K. "A history of the unity game engine." Diss. WORCESTER POLYTECHNIC INSTITUTE 483 (2014). 4. Beeching, Edward, et al. "Godot Reinforcement Learning Agents." arXiv preprint arXiv:2112.03636 (2021).