# GradSLAM + RL Assignment, Winter 2020
## Preliminaries
Please keep these instructions stricly confidential. Especially do **not** share this assignment or discuss details of this assignment with people that aren't Krishna Murthy Jatavallabhula, Florian Golemo, or Liam Paull while solving the assignment and up until three months after receiving the assignment. This assignment is intended for individual evaluation and not as team work.
## Overview
In short, there are 3 parts and you have 6 full days to solve all of these:
1. You have to train up a policy on an RL environment
2. You have to write up a quick (1-2p max) summary on what you did and why
3. You have to make a super short presentation for a paper of our choice (15min, ideally with a few slides)
**Emergency Contact:** If you're not sure about anything or if you have any questions, feel free to reach out to Florian Golemo (fgolemo@gmail.com), preface your email subject line with the word `[ASSIGNMENT]`
## 1) Training an RL Policy
We would like you to train a policy to solve a gym-like RL environment, namely SEVN (a Montreal sidewalk environment): https://github.com/mweiss17/SEVN
Specifically, we would like you to attempt to solve the `SEVN-Test-AllObs-Shaped-v1` environment.
Here's some crucial information:
- We assume that you know how to investigate a Gym-compatible RL environment (i.e. install and find out what the observations, actions, rewards are).
- We assume that you can get access to a GPU for training a policy somewhere.
- If either of these two are not true, please reach out.
- You are free to use whichever RL training algorithm (DQN, PPO, A2C, etc.) you like.
- There are no immediate outputs that we expect from this task but you will need the results from this task in the next one.
## 2) Report
We expect you to send us (Krishna and myself) a 1-2 page PDF that has a single-column layout and that reports on which decisions you made and why you made them.
Here's some things to keep in mind:
- The report should contain the quantitative (and optionally qualitative) results of task 1.
- Keep in mind that the reported baseline performance on the `SEVN-Test-AllObs-Shaped-v1` environment is ~75% success rate.
- Please also answer these questions:
- If your policy was evaluated on the `SEVN-Train-AllObs-Shaped-v1` (note the `Train` instead of `Test`), what kind of performance would you expect and why?
- If your policy was evaluated on the `SEVN-Test-ImgOnly-Shaped-v1`, what kind of performance would you expect and why?
- What could you change in your policy training to increase performance in either case?
## 3) Paper Review
We would like you to read a paper and present it to us over a Zoom call.
We will send you the paper as soon as you send us the PDF with the report from task 2.
Notes:
- Please prepare a 15min summary of the paper in the way you would communicate this to a reading group of fellow students who are familiar with ML/RL/SLAM in general but not this paper in particular.
- The purpose is for your peers to understand the main contents and method of the paper, not to reproduce it in great detail.
- We recommend you prepare a few slides illustrating your points. Use whatever slide design your like.
- If you use slides, please keep in mind that slides are there to **aid** your argument, not contain your argument word-for-word.
- Keep the presentation to 15min max.
- You do not have to send us the slides in advance.
- Please reach out to both Krishna and me to schedule the Zoom call.