---
# System prepended metadata

title: 'CSCI 0410/1411 Spring 2026 Final Project Part 3: Go To Town with your Go Agent!'

---

# CSCI 0410/1411 Spring 2026 Final Project Part 3: Go To Town with your Go Agent!

==**Due Date: 05/04/2026 at 11:59pm**==

**Need help?** Feel free post on [Edstem](https://edstem.org/us/courses/93617) for TA assistance.

:::danger
**⚠️ Battle Alert 🏆 ⚠️** You must activate the CS410 virtual environment every time you work on a CS410 assignment! You can find the activation instructions [here](https://hackmd.io/@cs410/BJvhqHXuR). If you forget to activate this virtual environment, you will almost certainly encounter import errors, and your code will not run.
:::

![Arena24](https://hackmd.io/_uploads/Bk5iMzuS-e.jpg)



So far, we have provided guidance on the design and development of your agents. Your agents are competent, but can be significantly improved. Your final task is open-ended and your goal is to create your own agent to comptete against some TA-programmed bots. You have complete freedom within the guidelines provided. You can use MCTS or not. You can use deep learning or reinforcement learning to learn policies or construct your own hand-crafted heuristic function. The choice is yours. We provide some places to start and other potentially useful resources, but these are not required of your agent.


## Submitting your Agent

To construct your final agent, we will call `get_final_agent_5x5()`, which should return an agent object. In the Stencil, this method currently returns an `MCTSAgent`, but you can and should change this to whatever your final Agent is.

You can name your agent by implementing the `__str__` method, although agents will also be identifiable by your gradescope ID. The easiest way to ensure your agent behaves as expected is to put all required methods and classes directly in the `agents.py` file.

:::danger
**⚠️WARNING⚠️** Do not modify `GoSearchProblem` or any other files, as we will provide staff implementations of these to both agents playing a match. (Our implementation does randomize legal actions, as was suggested in part 1 of the project)

This does not include the heuristics you added in Part 1.
:::

<!-- Maybe clarify that more? was a point of confusion for me -->

You will submit your agent to Gradescope just like any other assignment. It will then play against a series of staff agents. 


## Rules

![Screenshot 2026-01-16 at 4.27.43 PM Medium](https://hackmd.io/_uploads/Sk70AXdBWl.jpg)


* Your agent must make moves within the specified time limit. If you are running a loop with `while curr_time- start_time < time_limit`, your agent will always run out of time! If you run out of time for a single move, your agent will forfeit the game and it is an automatic loss. We encourage you to give a bit of lee-way just to ensure your agent to not lose on time.
* No *pondering*: Pondering refers to the practice of thinking during your opponents turn. While generally interesting to think about how you might do this, we want to avoid any nefarious and sabotaging behavior (spooling up empty work during your opponents turn to starve them of computational resources). Your agent will persist between moves (i.e., you can save information between moves), but it should not continue processing after `get_move` ends.
* No catching of `support.TimeoutException` (or generic exceptions). If your agent takes longer than the time cutoff, we will terminate it and your agent will forfeit the game.
* Must be written in Python. Search algorithms work better the more states they can explore. Python is a relatively slow language compared to compiled languages, like C and C++. Yes, your agents would be more efficient if written in a faster language, but that is not in the spirit of the class. The focus should be on AI techniques used, not programming language used.
* Your agent will be provided with 8 CPU cores and 16 GB of RAM to use. We will not provide GPUs to agents to use. If you’d like to use deep learning methods, you can still train your models on GPUs (through colab or other means), but at execution time we will run every agent on the same hardware configuration.
* Your agents will run in docker containers with the same setup as the local environment that you’ve installed (if you followed the directions). If the autograder runs, you're good.
* Your submission to Gradescope should not take more than 100 MB of memory. This means if you use additional files for neural network parameters or an opening book, they should fall within this 100 MB limit. We don’t expect people to come close to this limit.


## Possible Extensions

* Better Heuristic for Iterative Deepening Search: What is currently missing from our non-learned heuristic values are an idea of which player has captured the most [territory](https://www.pandanet.co.jp/English/learning_go/learning_go_4.html). The simple heuristic is only tracking the number of pieces captured. You can produce a better heuristic by finding regions of empty cells that are completely surrounded by a single color (or the edge of the board).
* Opening Book: In one sense, the first move of the game is the hardest move for your search algorithms. The number of available actions and therefore the branching factor of the search tree is the highest on the first move. In another sense, the first move should be the easiest move of the game. Your agent should not need to use any computation time on the first move of the game, you should simply hard code in the best action to take on the first move. But why stop there? There are known popular good openings for both 5x5. These can be memorized and added in what is called an Opening Book. To implement this, your agent will need to check if a state is in your opening book and find the best action (also in the opening book) if it is. Consider what data structures are required for fast look ups of a given state. Your agent can load other files into memory, we just ask that you limit your total memory usage to a reasonable amount (see the rules).

|![multiple ways to reach state](https://hackmd.io/_uploads/HkKeF-drZg.png)
|:--:
| Figure 1: A board state that can be reached in multiple ways.

* Transposition Tables: A transposition in Go (and other board games) is a state that can be reached through multiple move orders. For example, the shown game state shown in Figure 1 may have been reached in multiple ways. Black could have played their stone at (6, 2) first and the stone at (8, 3) second, or vice versa. In your existing search algorithms different move orders result in different states. However, you can reduce the branching factor of your search tree significantly if you merge together search nodes that correspond to the same game state. A Transposition Table saves states and their current evaluations so you can check to see if a state has already been evaluated and you can avoid any expensive re-computations
* Saving Information Between Moves: the `get_move` of your agent will always be called after a single move by your opponent. You may be able to reuse parts of your old search tree or other data structures between different moves. Since your agent is an object, you can use instance variables (`self.whatever`) to save information in between calls to `get_move`.
* Flexible Time Management: When there is obviously only one good move (e.g., you can immediately capture a large number of pieces), your agent shouldn’t waste time building a large search tree. Most of the time spent thinking by top-level professional Go and Chess players is in the middle game. Perhaps these professionals have the right idea. You may want “flexible” time management that doesn’t take a constant amount of time for every move. Remember, the `time_limit` passed to `get_move` is not how much time your agent has to take, it is how much time it *can* take.
* Better selection/playout policies in MCTS: As discussed previously, playouts in MCTS may be more informative if a simple heuristic is used to guide actions rather than uniform random simulations. Likewise, other selection methods (or parameters) may be used to improve MCTS performance as well.
* More Supervised Learning: We provided you with an existing dataset for 5x5 games that we collected using MCTS. You can further improve these methods from part 1 by using better neural network architectures or by gathering more data to train on. To collect more data, you can run your own agents against each other and collect information on states, actions taken, and the results of the games.
* Reinforcement Learning: Similar to supervised learning, the goal would be to learn a heuristic function or policy for a given state. The advantage of reinforcement learning is that you can potentially collect more data (through many simulations) than you can with supervised learning alone. Open-spiel provides a gym-like environment (much like the environments we used for blackjack and cartpole), available by calling:
```
from open_spiel.python.rl_environment import Environment

env = Environment("go", board_size=5)
```
We do not have an equivalent for PyGo, however openspiel is available for installation on Google Colab or Github codespaces as well.

If you wish to try Deep-Q learning, there are many [tricks](https://abhishm.github.io/DQN/) that are often used to improve the stability of the training procedure. Even though neural networks are more expressive than the linear models we used in our previous fitted-q learning assignment, they can be much harder to train. The library [Stable Baselines](https://stable-baselines3.readthedocs.io/en/master/guide/rl_tips.html#reinforcement-learning-tips-and-tricks) provides a helpful list of tips and tricks to get started on selecting an RL algorithm for your problem.

AlphaGo and AlphaZero both used self-play to train their agents. In self-play, your reinforcement learning agent would play against itself, slowly improving over time (hopefully). We recommend you start by playing your reinforcement learning agent against a simple foe, like `RandomAgent` or `GreedyAgent`, to begin with and incrementally increasing the difficulty as training progresses (an approach called curriculum learning).


* Hybrid Agent: Each of the agents you developed in parts 1 and 2 of the final project have their strengths and weaknesses. Maybe your learned agents from part 2 play the opening and middle phase of the game well, but fail towards the end of the game (which is what ours tend to do). Alpha-beta search is guaranteed to find the best moves available, but in general runs slowly when there are many available actions. You can combine the strengths of these two agents by using a learned policy in the beginning of the game and switching to alpha-beta search with a non-learned heuristic at the end of the game. You may be able to combine IDS and MCTS in similar ways.


## Additional Resources

* [The ChessProgrammingWiki](https://www.chessprogramming.org/Main_Page) is a community focused on the development of Chess engines. Many of the techniques used in chess engines are applicable to Go as well. Additionally, they have a page dedicated to [Go](https://www.chessprogramming.org/Go). For instance, it has pages dedicated to opening books and transposition tables that you may find helpful if you choose to implement these features.
* A thesis by Petr Baudis on using MCTS to play Go. Includes practical advice for handling transpositions, selection policies, and playout policies. https://pasky.or.cz/go/prace.pdf
* [Alpha Go](https://www.nature.com/articles/nature24270): provides a description of the methods used to achieve super human Go play for the first time.

## Tips and Hints

<!-- * The AI-enhanced bracket will use 9x9 boards by default. If you are going to enter that bracket, you can run games on a 9x9 board by sending the **--size** argument to game runner. Or, you can save time and potential mistakes by directly editing the default game size in game_runner.py (i.e., change `parser.add_argument(’--size’, type=int, default=5, help=’Board size’)` to have a default size of 9. -->

* “Premature Optimization is the Root of All Evil”– [Donald Knuth](https://en.wikipedia.org/wiki/Donald_Knuth). Faster implementations will explore more states and generally perform better. However, beware of trying to optimize your code too soon. It is better to have a bug-free slow implementation than a fast buggy implementation. If you cannot understand your own code, you will never be able to get it to run without bugs.

* Version Control (git) is Your Friend: You may find yourself implementing many different versions of your agent and comparing their performance. You will help yourself if you use helpful commit messages and commit whenever you have a notable agent worth saving. You will thank yourself if you ever have to revert to a previous version of your code. Now is the time to learn how use git to help development, not just for how to make final submissions.

* Running experiments and creating figures is not just helpful for understanding results, it can help you debug your code faster. It can be hard to debug the algorithms you are developing with break points or print statements alone (as you are probably aware). **Making a figure, plot, or other visual representation can help you debug your implementation** and understand whether or not your algorithm is working as intended.

### Grading

We will be tracking your performance against staff agents. However, we value creativity and ambition above pure performance and will take a holistic approach to grading. We encourage you to try something that may not work (for example, reinforcement learning) over something that is guaranteed to work (e.g., MCTS with better time management and other incremental improvements). You will be evaluated along the following four axes:

* **Approach and Creativity**: Is your approach incremental or did you try something new and experimental? Is it clear you put thought and effort into your agent?
* **README**: Does your README clearly explain your approach and how you evaluated your approach? Does it contain any helpful figures to help you understand how your algorithm is working?
* **Code Quality**: Is your code well written and easy to follow? Is it well documented? Is it your own original code?
* **Performance**: Does your agent perform well against other agents (staff and peers)?

You need not check off each of these items to achieve a good score on the final project (the README is always required). For instance, If your README is thorough and helps explain your unique approach, you need not have an agent with high performance. If your agent performs well and is unique, we will forgive your poorly written code.

Each of the categories described above will contribute equally to your final score. Your final score will be based on a total of 100, even though 120 points are possible. This means you can lose points in some categories and still achieve a good overall score.

| Task                  | Points | Details |
|-----------------------|--------|---------|
| Approach and Creativity | 30 | Full points awarded for attempting something novel and ambitious (i.e., not just small things from the list of suggestions provided). Reusing agents from previous parts of the final project without any modifications will not receive any points in this category. |
| README                | 30 | Full points awarded for a thorough explanation (including experiments and figures) of how your agent behaves. Submissions with no experiments or figures can still achieve high marks in this category, but the full 30 points will be reserved for exceptional READMEs. |
| Code Quality          | 30 | This category will also rely, in part, on the description of your approach in your README. We will look at your code and make sure you are following good style and that the code was written by you, or has citations to outside sources where appropriate. We expect most people will get full marks in this category. |
| Performance           | 30 | You will receive points for beating staff agents. Beating more sophisticated staff agents will award more points. The breakdown is: RandomAgent (1 points), GreedyAgent (2 Points), AlphaBeta (3 points), MCTS (4 points), SuperAgent1 (10 points), SuperAgent2 (10 points). For full credit, your agent will need to have a winning record against the simpler agents, and close to a winning record against the more advanced agents. Students who do not submit a working agent (i.e., there are runtime exceptions or syntax errors) will receive 0 for this category. This score will be visible to you on the Gradescope autograder. |


### Final Report

You **must** include a README with your final submission that describes your agent and what you have done in this final phase of the final project. If you have known bugs, you should include a description of them here. If you tried other approaches that are not a part of your final submission, you should include them here and a brief statement on why you decided not to use them (e.g., they were too slow, buggy, etc.).

Congratulations on completing your CS0410 journey and earning your place in the Hall of Eternal Glory—where only the bravest coders who survived the grind, the bugs, and the occasional rage moment are crowned. You’ve smashed towers, optimized your strategy, and collected enough brainpower elixir to officially earn your place  in history 👑🏆

![hall of eternal glory](https://hackmd.io/_uploads/rJemdXOS-g.jpg)