HackMD - Collaborative Markdown Knowledge Base

## RL Mario Agent > VLG Open Project Summers 2022 ### Problem Statement Playing and winning a computer game is a very challenging and rewarding task. People have always tried to develop "smart players" who can learn the game and then ace it. This project aims to build a robust RL agent that can make it through the first level of Super Mario Bros. ### Motivation When it comes to solving problems where you learn from the environment by taking actions and getting rewards, reinforcement learning algorithms are a very intuitive choice. Playing the game of Mario is a multi-action task, where you have to keep avoiding obstacles and earn points to survive till the end. ### Solution - To play and ace a game is to maximize your reward and stay alive till the end of the mission. This type of problem can be thought of as a Q-function learning task - A state-action value is the quality of being in a particular state and taking a specific action off of that state - If an agent is on a particular state S, it will take action a from s such that Q(s, a) has the highest value, i.e., it will try to go to the state from which it can expect the highest reward. But to do this, we need to learn the Q-values of all the state-action pairs - Use the Bellman Updation Equation for Q-Learning - There is one major issue with Q-learning that we need to deal with: over-estimation bias, which means that the Q-values learned are higher than they should be. To get more accurate Q-values, we use something called double Q-learning. - So we go for DQN. In double Q-learning, we have two Q-tables: one which we use for taking actions, and another specifically for use in the Q-update equation ### Timeline #### Week 1 : `May 25, 2022` - `May 31, 2022` - Get comfortable with Google Colab and pytorch framework - Understand basic concepts of RL - Basic Jargon - Bellman Equations - Q learning - Get familiar with super mario gym #### Week 2 : `Jun 1, 2022` - `Jun 7, 2022` - Understand the utility functions to display frames and buffers while training - Use the Q-Learning approach to solve the problem - Write `Q-update` rules - Make a pipeline fitting all these elements together and run your first training loop - Analyse the results via graphs #### Week 3 : `Jun 8, 2022` - `Jun 14, 2022` - Get yourselves familiar with Double Q-Learning - Maintaining two Q-value tables - Write a DQN Agent - Update the `Q-update` rules - Write a `DQN Solver` to make use of deep Q-Learning - Write the `remember`, `recall`, `experience_replay` methods - Train the DQN model - Run inference using the fine-tuned model - Analyse with graphs and write a conclusion ### Resources - [Great overview on RL](https://lilianweng.github.io/posts/2018-02-19-rl-overview/) - [Bellman Equations](https://www.analyticsvidhya.com/blog/2021/02/understanding-the-bellman-optimality-equation-in-reinforcement-learning/) - [Implementation of Reinforcement Learning Algorithms](https://github.com/dennybritz/reinforcement-learning) - [Q-Learning](https://blog.floydhub.com/an-introduction-to-q-learning-reinforcement-learning/) - [Double Q-Learning](https://towardsdatascience.com/deep-double-q-learning-7fca410b193a) - [Doubel Q-Learning with Tensorflow](https://rubikscode.net/2021/07/20/introduction-to-double-q-learning/)