# CS410 Homework 9: Fitted Q-Learning **Due Date: TBD** **Need help?** Remember to check out [Edstem](https://edstem.org/us/courses/74300/discussion) and our website for TA assistance. ## Downloads Like in some previous homeworks, this assignment will take place in a Python notebook file. Please click [here](https://classroom.github.com/a/IB9XIvio) to download the assignment code. ## Handin Your handin should contain: - all modified files, including comments describing the logic of your algorithmic modifications - a README, containing a brief overview of your implementation and answers to the required conceptual questions ### Gradescope Submit your assignment via Gradescope. To submit through GitHub, follow these commands: 1. `git add -A` 2. `git commit -m "commit message"` 3. `git push` Now, you are ready to upload your repo to Gradescope. *Tip*: If you are having difficulties submitting through GitHub, you may submit by zipping up your hw folder. ## Rubric | Component | Points | Notes | |-------------------|------|--------------------------------| | Discretize State and normalize state | 15 | Points awarded for correctly implementing `discretize`. | | Epsilon Greedy | 5 | Points awarded for correctly implementing `epsilon_greedy`.| | Complete Tabular Q-Learning | 5 | Points awarded for correctly calling `discretize` and `epsilon_greedy` to train q-learning with Cartpole. | | Fitted Q-Learning | 20 | Points awarded for correctly implementing fitted Q-learning.| | L1 Regularization | 10 | Points awarded for correctly implementing L1 Regularization and using the correct gradient for the weight update step of `fitted_q_learning`. | | Picks New Features | 15 | Points awarded for selecting features based on results of L1 regularization. These features should lead to a learned model that performs almost perfectly on cartpole (total reward approaches the total number of steps taken).| | Conceptual Questions | 30 | Points awarded for thoughtfully and accurately answering both conceptual questions. | <!-- Ungraded? Or should we add to conceptual questions?: How well did this discretization approach do? How can you tell? We've included a plot of the average reward for each episode, which will probably look like a bunch of random noise. What would you expect to see if your learning algorithm was working well? --> :::success Congrats on submitting your homework; You're almost to the final stretch! ![Screenshot 2025-04-02 at 9.11.36 PM](https://hackmd.io/_uploads/SkIIxvopJl.png) :::