# CS410 Homework 9: Value Iteration and Q-Learning
==**Due Date: 04/13/2026 at 11:59pm**==
**Need help?** Remember to check out [Edstem](https://edstem.org/us/courses/93617) and our website for TA assistance.
:::danger
**⚠️ Battle Alert 🏆 ⚠️** You must activate the CS410 virtual environment every time you work on a CS410 assignment! You can find the activation instructions [here](https://hackmd.io/@cs410/BJvhqHXuR). If you forget to activate this virtual environment, you will almost certainly encounter import errors, and your code will not run.
:::

:::danger
**Please make sure to run the following command in your environment before starting the assignment:**
`pip install gymnasium==1.1.1`
:::
## Downloads
Like in some previous homeworks, this assignment will take place in a Python notebook file.

Please click [here](https://classroom.github.com/a/0yDWGBXz) to download the assignment code.
## Handin
Your handin should contain:
- all modified files, including comments describing the logic of your algorithmic modifications
- remember to convert the Jupyter notebook into a .py file! Directions are at the very bottom of the notebook.
- a README containing:
- your responses to any conceptual questions
- known problems in your code
- anyone you worked with
- any outside resources used (eg. Stack Overflow, ChatGPT)
### Gradescope
Submit your assignment via Gradescope.
To submit through GitHub, follow these commands:
1. `git add -A`
2. `git commit -m "commit message"`
3. `git push`
Now, you are ready to upload your repo to Gradescope.
*Tip*: If you are having difficulties submitting through GitHub, you may submit by zipping up your hw folder.
## Rubric
| Component | Points | Notes |
|-------------------|------|--------------------------------|
| Win-rate calculations (by hand) | 10 | Points awarded for correct answers and explanations of how you calculated your values. |
| Monte Carlo Simulation | 10 | Points awarded for correct return values. For a provided policy, answers should match expected outcomes to within some threshold.|
| Value Iteration | 30 | Points awarded for computation of value function. Since there is no randomness in VI, your answer should match the expected result to within floating point precision. |
| Q-Learning | 30 | Points awarded for correct implementation of Q-Learning. Since there is randomness in Q-Learning, your policy should score within a provided range of our solution policy. |
| Blackjack full game MDP Formulation | 20 | Points awarded for formulations of Blackjack with the full rule set and blackjack with card counting as MDPs. |
:::success
Congrats on submitting your homework. We are proud of you!!
<p style="text-align: center;">
<img src="https://hackmd.io/_uploads/BJVM4CLr-g.jpg" alt="melonfruit" />
</p>
:::