# CS410 Homework 8: Value Iteration and Q-Learning **Due Date: 11/8/2024** **Need help?** Remember to check out [Edstem](https://edstem.org/us/courses/61309) and our website for TA assistance. ## Downloads Like in some previous homeworks, this assignment will take place in a Python notebook file. Please click [here](https://classroom.github.com/a/XduhONIs) to download the assignment code. ## Handin Your handin should contain: - all modified files, including comments describing the logic of your algorithmic modifications - a README, containing a brief overview of your implementation ### Gradescope Submit your assignment via Gradescope. To submit through GitHub, follow these commands: 1. `git add -A` 2. `git commit -m "commit message"` 3. `git push` Now, you are ready to upload your repo to Gradescope. *Tip*: If you are having difficulties submitting through GitHub, you may submit by zipping up your hw folder. ## Rubric | Component | Points | Notes | |-------------------|------|--------------------------------| | Win-rate calculations (by hand) | 10 | Points awarded for correct answers and explanations of how you calculated your values. | | Monte Carlo Simulation | 10 | Points awarded for correct return values. For a provided policy, answers should match expected outcomes to within some threshold.| | Value Iteration | 30 | Points awarded for computation of value function. Since there is no randomness in VI, your answer should match the expected result to within floating point precision. | | Q-Learning | 30 | Points awarded for correct implementation of Q-Learning. Since there is randomness in Q-Learning, your policy should score within a provided range of our solution policy. | | Blackjack full game MDP Formulation | 20 | Points awarded for formulations of Blackjack with the full rule set and blackjack with card counting as MDPs. | :::success Congrats on submitting your homework; Steve is proud of you!! ![image](https://hackmd.io/_uploads/B1mC8oy2A.png) ![image](https://hackmd.io/_uploads/S1OQ2aCwA.png) :::