F24: CS410 Homework 2: Adversarial Search

# CS410 Homework 2: Adversarial Search **Due Date: 9/25/2024 at 12pm** **Need help?** Feel free post on [Edstem](https://edstem.org/us/courses/61309) for TA assistance. ## Assignment Overview Humans have dreamed of a chess-playing machine since at least 1770, with the advent of [Mechanical Turk](https://en.wikipedia.org/wiki/Mechanical_Turk). More recently, [Alan Turing](https://en.wikipedia.org/wiki/Alan_Turing), the father of computer science, designed a [chess-playing program with two steps of lookahead](https://www.history.com/news/in-1950-alan-turing-created-a-chess-computer-program-that-prefigured-a-i). His work was built upon by Claude Shannon's groundbreaking 1950 paper, in which he outlined the minimax algorithm. All of the aforementioned work was theoretical, as machines were not yet capable of playing games. By 1959, however, Arthur Samuels had designed and built a checkers-playing machine. Samuels algorithm employed $\alpha\beta$-pruning. Moreover, his machine learned by playing games against itself--and the term **machine learning** was coined! Today's game-playing programs, which also learn in self-play, have surpassed human performance in substantially harder games, including Backgammon, Chess, Poker, and Go. All of these games are adversarial: they are played by two players, with one winning and the other losing (unless there is a draw). In this assignment, you will implement *minimax* and *$\alpha\beta$-pruning* on two adversarial games, namely Tic-Tac-Toe and Connect Four. These algorithms can be used to solve variants of these games on larger than usual grids. Doing so, however, requires heuristics to guide the search, just as heuristics are used to guide informed search algorithms.  ### Learning Objectives What you will know: - two adversarial search algorithms, *minimax* and *$\alpha\beta$-pruning* - when these algorithms are provably correct and when they are not What you will be able to do: - implement depth-limited adversarial search - develop heuristics for adversarial games - reason about the tradeoff between an algorithm's optimality and its computational efficiency ## Data Structures Perhaps you have played [Tic-Tac-Toe](https://en.wikipedia.org/wiki/Tic-tac-toe) (also known as Noughts and Crosses). If not, you can try it out [here](https://playtictactoe.org/). Either way, you may not have played on a 4×4 or a 5×5 board. In this assignment, you will have a chance to play this game on boards of various sizes against a game-playing AI agent that you develop. [Connect Four](https://en.wikipedia.org/wiki/Connect_Four) is another popular children's game. (Actually, it is also popular with [dogs](https://www.youtube.com/watch?v=kFvBtSpZ3DY).) Connect Four is a solved game, meaning a winning strategy for the first player is known. A losing strategy for the first player is also known. This game was solved by man, not machine, but machines later verified this manually-derived solution using enhanced versions of minimax search and $\alpha\beta$-pruning [(Allis, 1988)](https://www.semanticscholar.org/paper/A-Knowledge-Based-Approach-of-Connect-Four-Allis/f3e05731f4f4315159f2d3ef7cb5d94f1f9b5c14).  These two games are examples of deterministic, two-player, alternating move, zero-sum games of perfect information. *Deterministic* means no randomness, unlike, say, backgammon; *zero-sum* means one player wins and one loses, unless it is a draw; *perfect information* means the game state is fully observable, as opposed to, say, poker, where one player's cards are hidden from the other players. We have built an abstract class `AdversarialSearchProblem` to represent such games, which we then instantiate to implement Tic-Tac-Toe and Connect Four. The `AdversarialSearchProblem` class--interface, really--requires: * `get_start_state() -> State`: returns the start state for the search problem.  * `get_available_actions(State) -> [Action]`: returns the actions available at the given state. * `transition(State, Action) -> State`: a transition function that determines the next state, given a state and an action. * `is_terminal_state(State) -> bool`: a function that identifies terminal states (end states where no actions are available). * `get_reward(State) -> int`: a reward function that labels terminal states as a win, loss, or draw. Returns an integer. In Tic-Tac-Toe and Connect-4 games, there are three possible return values: the first player to move wins: $\infty$; the game is a draw: $0$; or the second player to move wins: $-\infty$. Just as `SearchProblem` was parameterized by `State` in Homeworks 0 and 1, `AdversarialSearchProblem` is parameterized by `GameState`, another abstract class, intended to represent a game's current configuration (or state), whose only (required) method is `player_to_move`, which indicates whose turn it is to play. Note: `player_to_move` returns the index of the player to move (0 or 1 for a two player game). :::warning **Important:** `AdversarialSearchProblem` is also parameterized by `Action`. In `SearchProblem`, transitions were *correspondences*, i.e., functions from a state to a set of states, where each state is the ensuing set was the result of taking an (implicit) action. Now, actions are explicit, so that transitions are *functions* that map states and actions to *the* next state. ::: ## Algorithms Your task in this assignment is to implement two adversarial search algorithms, minimax and $\alpha\beta$-pruning. You will run these algorithms on Tic-Tac-Toe and Connect Four--but note that they can be used to solve any deterministic, two-player, alternating-move, zero-sum, perfect-information game. You can find pseudocode for these algorithms in the textbook. :::spoiler **Video** You might find [this video](https://www.youtube.com/watch?v=xBXHtz4Gbdo&ab_channel=JohnLevine) helpful in understanding $\alpha\beta$-pruning. ::: In principle, both of these algorithms can solve any game (i.e., they will eventually discover a winning strategy for the first player, if one exists). In practice, however, a game's search space may contain more states than the number of atoms in the universe (which is somewhere between $10^{78}$ and $10^{82}$, apparently). For example, the number of game states in Chess is estimated to be more than $10^{111}$. Consequently, it is necessary to limit the size of the search tree, either by limiting the branching factor, or by limiting the search depth. (The depth of an adversarial search tree is often called its **ply**.) In this assignment, you will implement *depth-limited* minimax and $\alpha\beta$-pruning, which, as the name suggests, precludes expanding the search tree beyond some pre-specified depth $d$. :::warning **Note:** The start state in a game tree is at depth 0. Depth 1 refers to all states reachable from the start state using the actions available at the start, depth 2 refers to all states reachable from states at depth 1 using the actions available at those states, and so on. ::: When a depth-limited search algorithm reaches its maximum depth, it applies a `heuristic` function to evaluate states. We have thus built an extension of the `AdversarialSearchProblem` class, namely `HeuristicAdversarialSearchProblem`, which extends the former with an evaluation function `heuristic`. Depth-limited search algorithms always terminate with a solution. However, if $d < \infty$, they are no longer provably correct. ::: spoiler **Why not?** Because they depend on a heuristic function (i.e., an estimate) to evaluate states. ::: You will find that minimax and $\alpha\beta$-pruning can solve very small versions of Tic-Tac-Toe: i.e., they complete their searches within a reasonable amount of time even when $d=\infty$. As you scale up (e.g., Tic-Tac-Toe on 4×4 or 5×5 boards), however, they can no longer complete their searches in a reasonable amount of time unless $d < \infty$, in which case the optimality of their output cannot be guaranteed.  ## Tasks Your first task concerns minimax and Tic-Tac-Toe. ### Part 1: Minimax :::info **Task 1a** Implement the minimax algorithm in the `minimax` function in `adversarial_search.py`, using the default value for the search depth, namely `cutoff_depth=float('inf'))`. ::: :::success **Signature** The `minimax` function takes as input a `HeuristicAdversarialSearchProblem` and a maximum search depth, which by default is set to `inf`, and outputs an optimal action for the player whose turn it is together with a dictionary of statistics, as in HWs 0 and 1. In this case, the only statistic you need to track is the number of states expanded. ::: :::warning Note that the signature of this `minimax` function is different than the one presented in lecture, where only the value of the game was returned. Here, we are asking you to return an optimal action instead: i.e., an action that obtains the minimax value. ::: :::spoiler **Why keep track of statistics?** We ask you to calculate statistics as your search algorithms run so that you can better understand their performance. In this assignment, we chose to track the number of states expanded. Why not just track the time it takes for an algorithm to run? The reason is, many things that are external to an algorithm can affect its run time, like what else is running in the background. In this assignment, a state is expanded when `get_available_actions` is called on it. ::: Once your implementation is working, you can try playing Tic-Tac-Toe against it. To do so, invoke `run_game` from the command line as follows: run `python game_runner.py --game=ttt --player2=minimax`. (Player 1 is you!)  That was fun! How about a game of Connect Four? Run `python game_runner.py --game=connect4 --player2=minimax` from the command line. A GUI with a Connect Four board should appear. You can use the arrow keys to select a column and Enter to drop your piece. You are welcome to wait for a bit, but the program is unlikely to terminate any time soon, unless you intervene (press `CTRL-C` in the console). Hmmm...that was frustrating. But as noted above, there is a fix. We can limit the depth of our search to some finite $d < \infty$. :::info **Task 1b** Generalize your implementation of minimax in to depth-limited minimax, given some $d < \infty$. ::: To run depth-limited minimax, you need a heuristic by which to estimate the value of an arbitrary game state. We have developed such a heuristic for you for Connect Four. After you have modified your minimax implementation for depth limited search, try it out with a depth of 2 by running `python game_runner.py --game=connect4 --player2=minimax --cutoff 2` from the command line. How did this AI agent do? Was it fast enough? Hopefully, but how well did it play? You can try it out with a depth of 3 (or more) to improve its performance, but doing so will eventually try your patience. ### Part 2: A Tic-Tac-Toe Heuristic Your next task is to implement a heuristic function for Tic-Tac-Toe, so that you can play against your minimax agent on larger boards. Your heuristic should evaluate boards from the perspective of the player who starts the game. The heuristic should be positive when player 1 (the maximizer) is winning and negative when player 2 (the minimizer) is winning. It is up to you to define what "winning" actually means in this context. :::info **Task 2** Implement the `heuristic` function within `heuristic_ttt_problem.py`. ::: :::success **Signature** The `heuristic` function takes as input a `TTTState`, and returns a value for that state. ::: :::spoiler **Note** Be sure your heuristic is applicable to an arbitrary sized Tic-Tac-Toe board, not only to boards of size 3x3. Tic-Tac-Toe on a 4x4 board requires 4 pieces in a row to win, on a 5x5 board requires 5 pieces, etc. ::: You can now play Tic-Tac-Toe against your AI on a 4x4 board: `python game_runner.py --game=ttt --dimension=4 --player2=minimax --cutoff=2`. Depending on your patience level, you can try it with depth 2, depth 3, etc. Reflect on how well your minimax agent plays as a function of its depth, and how long it takes. ### Part 3: $\alpha\beta$-Pruning The $\alpha\beta$-pruning algorithm is a variant of minimax that often yields significant run-time savings. :::info **Task 3** Implement $\alpha\beta$-pruning in the `alpha_beta` function in `adversarial_search.py`. ::: :::success **Signature** Like `minimax`, the `alpha_beta` function takes as input a `HeuristicAdversarialSearchProblem` and a maximum depth, with default value `inf`, and outputs an optimal action for the player whose turn it is together with a dictionary of statistics, as in HWs 0 and 1. Once again, the only statistic you need to track is the number of states expanded. ::: After implementing `alpha_beta`, you can play Tic-Tac-Toe against the default version (i.e., $d = \infty$) by invoking `run_game` as follows: `python game_runner.py -game=ttt --player2=ab`. Reflect on how the speed of your $\alpha\beta$-agent compares to that of your minimax agent. Next, try playing Tic-Tac-Toe against a depth-limited version of your $\alpha\beta$-agent: e.g., `python game_runner.py --game=ttt --player2=ab --cutoff=3`. Once again, reflect on how the speeds of your depth-limited $\alpha\beta$ and minimax agents compare. ### Part 4: Discussion Steve and Alex are engaged in a lively debate about the efficacy of minimax vs. $\alpha\beta$-pruning. Steve hypothesizes that $\alpha\beta$-pruning is better than minimax search, but Alex is unconvinced. So Steve decides to run a quick experiment to "prove" his claim. Steve's experiment comprises a Connect Four tournament between minimax and $\alpha\beta$ agents, varying the cutoff depth between 1 and 4. Each minimax agent (e.g., minimax with cutoff 1, minimax with cutoff 2, etc.) plays two games against each $\alpha\beta$-agent, one where the minimax-agent goes first and another where the $\alpha\beta$-agent goes first. The results are as follows: |![Screenshot 2024-09-19 at 12.28.09 AM](https://hackmd.io/_uploads/ByZhdXtT0.png) |:--: | Figure 1: Relative performance of $\alpha\beta$ vs. minimax. Higher performance score (red) indicates that the $\alpha\beta$-agent with that cutoff depth won more often than the minimax agent with the other given cutoff depths. :::warning You can rerun this experiments by running the `main` method in `compare_performance.py`. You can alter the command-line arguments to adjust the maximum depth or the number of games played: e.g., `python compare_performance.py --max_depth=3 --n_trials=1`. In this experiment, minimax won four more games than $\alpha\beta$-pruning. ::: What do you think? Are you convinced by Steve's experiment? Why or why not? If you are not convinced by these results, you are not alone. Alex also remains unconvinced. He argues that it is necessary to run more than two games to prove his claim. But Steve points out that the game and both algorithms are *deterministic*, and therefore his results are definitive. Can you find a flaw in Steve's reasoning? :::spoiler **Ties** Even if our heuristic is not precise enough to identify any differences in actions at early stages of the game, there may be substantial differences somewhere down the line. Thus, if multiple possible actions yield the same value, it is a good idea to try them all--or at least more than one! One simple way to accomplish this is to introduce randomness--that is, to choose randomly among all the best (or near best) possible actions, as judged by the heuristic. ::: Alex runs a second experiment using Steve's setup, but he runs 100 tournaments, hence 200 games. Figure 2 tallies $\alpha\beta$'s wins and losses. The reported *performance scores* are the differences between the total number of wins and the total number of losses, from $\alpha\beta$'s perspective. In other words, a score of $200/200$ (resp. $-200/200$) indicates that $\alpha\beta$ won (resp. lost) all 200 games. |![Screenshot 2024-09-19 at 12.28.01 AM](https://hackmd.io/_uploads/S1ka_mKaC.png) |:--: | Figure 2: Relative performance of $\alpha\beta$ vs. minimax. Higher performance score (red) indicates that the $\alpha\beta$-agent with that cutoff depth won more often than the minimax agent with the other given cutoff depth. Based on his experiment, Alex concludes--as he knew all along--that the performance of the two algorithms is indistinguishable. Steve is struggling. Alex's experiment seems to suggest that $\alpha\beta$-pruning and minimax are comparable. Both are indeed optimal algorithms. But Steve remembers very clearly from his AI Foundations class that because $\alpha\beta$-pruning prunes subtrees, it can search deeper in a game tree than minimax, which suggests that it should outperform minimax. So he goes for a long walk to try to design a new experiment that might better be able to establish the dominance of $\alpha\beta$-pruning over minimax. Steve eventually has his Eureka moment! He returns to the lab to rerun Alex's experiment, but this time, he plots the total performance score per agent (i.e., the sums of the rows and columns in the above matrix) vs. the total number of states expanded (Figure 3). |![Screenshot 2024-09-19 at 12.27.47 AM](https://hackmd.io/_uploads/HyoT_mtpR.png) |:--: | Figure 3: Agent Performance vs. Number of States Expanded Per-Move. :::info **Task 4** Now are you convinced? Explain why or why not in a brief paragraph in your README. :::  ## Downloads Please click [here](https://classroom.github.com/a/6e56oMF4) to access the assignment code. ### Support Code - `game_runner.py`: The `main` function in this file runs an instance of Tic-Tac-Toe, where the players are by default you and a minimax agent. This default behavior is equivalent to running: ```shell python game_runner.py --player1=self --player2=minimax --game=ttt ``` To see a summary of all available command -line options, type: ```shell python game_runner.py --help ``` - `adversarial_search_problem.py`: Contains two abstract classes relevant to adversarial search algorithms: `AdversarialSearchProblem` and `GameState`. - `asps/ttt_problem.py`: Our implementation of Tic-Tac-Toe. - `asps/connect_four_problem.py`: Our implementation of Connect Four. :::warning **Note:** Our game implementations are equipped with a `GameUI` for visualization via `game_runner.py`. ::: - `heuristic_connect_four.py`: Contains our Connect Four heuristic evaluation function. ### Stencil Code - `adversarial_search.py`: This is where you will code the two adversarial search algorithms, `minimax` and `alpha_beta`. - `heuristic_ttt_problem.py`: This is where you will code your heuristic evaluation function for Tic-Tac-Toe. ### Testing Code - `unit_tests.py`: A file in which to write unit tests for your code. This file includes basic test cases for adversarial search algorithms run on a game tree represented as a `GameDAG`. These tests are designed to ensure that the algorithms return a *valid* action. You should add tests to check that your algorithms return an *optimal* action.  - `asps/game_dag.py`: You can use the `GameDAG` (directed acyclic graph) class to create simple, abstract adversarial search problems for testing purposes. Read the docstrings for the `GameDAG` class to learn how to create a `GameDAG`. :::spoiler **Hint: GameDAG Data Structure** In the `GameDAG` class, graphs are represented as adjacency matrices. Each matrix $M$ has $n$ rows and $n$ columns, with each index $i \in \{ 1, \ldots, n \}$ representing a vertex, so that $M_{ij} = 1$ if there is an edge from $i$ to $j$, and $M_{ij} = 0$ otherwise. In this representation, we can ensure the graph is acyclic by insisting that $M_{ij} = 0$ whenever $i > j$. The `terminal_evaluations` provide the values at the terminal states. :::  ## Submission ### Handin Your handin should contain the following: - all modified files, including comments describing the logic of your implementations, and your tests  - as always, a README containing: - a brief paragraph summarizing your implementation and highlighting the outcomes of all tests - your solutions to any conceptual questions - known problems in the code - anyone you worked with - any outside resources used (eg. Stack Overflow, ChatGPT) ### Gradescope Submit your assignment via Gradescope. To submit through GitHub, follow this sequence of commands: 1. `git add -A` 2. `git commit -m "commit message"` 3. `git push` Now, you are ready to upload your repo to Gradescope. :::danger **⚠️WARNING⚠️** Make sure you have the correct assignment repo and branch selected before submitting. ::: *Tip*: If you are having difficulties submitting through GitHub, you may submit by zipping up your hw folder. ### Rubric | Component | Points | Notes | |-------------------|----|--------------------------------| | Minimax (Task 1) | 25 | Points awarded for correct Minimax implementation. Implementations will be evaluated based on the value of the action returned. | | Tic-Tac-Toe Heuristic (Task 2) | 10 | Points awarded for implementing a *reasonable* Tic-Tac-Toe heuristic. While returning some constant (e.g., 0) is a *valid* heuristic, it is not a *reasonable* heuristic because it does not use any information about the game state. Your heuristic must work on Tic-Tac-Toe boards of arbitrary-size, not just on 3x3 boards.| | $\alpha\beta$-pruning (Task 3) | 25 | Points awarded for correct $\alpha\beta$ implementation. Implementations will be evaluated based on the value of the action returned. We will also check that the correct actions are pruned.| | Minimax and $\alpha\beta$-pruning comparison (Task 4) | 20 | Points awarded for thoughtful comparison of Minimax and $\alpha\beta$-pruning, using evidence from the graphs provided.| | Tests | 10 | Points awarded for adding additional tests for minimax and $\alpha\beta$-pruning that test the correctness of your implementations. | | README | 10 | Points awarded for including all required components. | - To evaluate `minimax` and `alpha_beta`, the autograder tracks the sequence of `(GameState, Action)`` pairs on which `transition` is called. For `minimax`, the autograder checks that your implementation expands all nodes up to a certain depth, and then returns an action with the optimal value. For `alpha_beta`, the autograder also checks that your implementation is properly pruning subtrees that need not be explored any further. To determine whether a subtree should be pruned, the autograder maintains its own values of $\alpha$ and $\beta$ based on the order in which your code visits states. If your implementation is correct, it should pass our autograder even if you introduce randomness. - Please avoid superfluous calls to `transition`, as they will throw off the autograder. This is how we track the order that states are visited in. :::success Congrats on submitting your homework! ![hw2_squid_font](https://hackmd.io/_uploads/HkZv93YP1g.png) ![hw2_squid_game_image-ezgif.com-webp-to-jpg-converter](https://hackmd.io/_uploads/ryj7jhYP1g.jpg) :::