F24: CS410 Homework 3: SAT Solving

# CS410 Homework 3: SAT Solving **Due Date: 10/2/2024 at 12pm** **Need help?** Feel free post on [Edstem](https://edstem.org/us/courses/61309) for TA assistance. ## Assignment Overview The most famous problem in computer science is Satisfiability (*a.k.a.* SAT). This problem asks whether there exists a satisfying assignment for a Boolean formula, i.e., a formula of propositional logic. It was made famous by Stephen Cook in 1971, when he established it as the first NP-complete problem. A problem is in the class of NP problems if it can be solved in *nondeterministic* polynomial time, which means that an answer can be guessed and checked in polynomial time. (The guessing is what motivates nondeterministism in the name.) Not only is SAT in this class, it is "complete" for this class, which means that every problem in this class can be reduced to SAT in polynomial time. And what this means is that if we have a SAT solver, we can use it to solve *any* NP-complete problem. Your task in this assignment is to build a SAT solver! Arguably the most important question in all of computer science is whether P = NP. That is, are problems in the class NP solvable in polynomial time? Common wisdom suggests that the answer to this conundrum is no, they are not. But as of yet, there is no proof of this claim one way or the other. Why is this theoretical puzzle of interest to us in an AI foundations class? Because it suggests that building a SAT solver may not be as easy as it sounds. To date, no polynomial-time SAT solving algorithm is known. Moreover, the space of possible satisfying assignments is exponential in size: assuming a Boolean formula with $n$ variables, there are $2^n$ satisfying assignments to consider. Consequently, you will apply **local search**, a tried-and-true AI technique, to solve SAT in this homework. Local search algorithms are not complete. They may *not* find a solution---in our case, a satisfying assignment---even when one exists. Worse, they may search forever (or until you run out of patience), when a solution does not exist. After building your SAT solvers, you will use them to solve two puzzle games. The first is the $n$-queens puzzle, which we reduced to SAT for you during lecture. The second is Sudoku, which you will reduce to SAT yourself. Finally, you will experiment with your solver to investigate the tradeoffs between exploration and exploitation within local search.  ### Learning Objectives What you will know: - what the most famous problem in computer science--Boolean satisfiability--is - what constraint satisfaction problems are - what local search algorithms are, and how they differ from non-local (global) search algorithms, both in terms of performance and guarantees - two local search algorithms for solving satisfiability, GSAT and WalkSAT What you will be able to do: - translate constraint satisfaction problems into Boolean satisifiability formulae - analyze the effectiveness of GSAT and WalkSAT at solving two puzzles: $n$-queens and Sudoku ## SAT & CNF SAT is a decision problem. Given a Boolean formula $\phi$, it asks whether $\phi$ has a satisfying assignment. That is, can the variables in $\phi$ be assigned values (either `True` or `False`), such that $\phi$ evaluates to `True`. For example, $A \wedge \neg B$ is satisfiable. It can be satisfied by setting $A$ to `True` and $B$ to `False`. The formula $A \vee \neg B$ is also satisfiable; it can be satisfied by *either* setting $A$ to `True` or by setting $B$ to `False`. On the other hand, $A \wedge \neg A$ is unsatisfiable: it has no satisfying assignment. Like lists and trees, Boolean formulae are recursive data structures. The simplest Boolean formulae are propositional variables, like $P$ and $Q$, and the constants $\top$ and $\bot$, which evaluate to `True` and `False`, respectively. Given one or more Boolean formulae, more complex Boolean formulae are built up using the logical operators $\wedge$, $\vee$, and $\neg$, and the parentheses syntax. For example, given $A$, $\neg A$ is also a Boolean formula, as is $(A)$. Similarly, given $A$ and $B$, $A \wedge B$ and $A \vee B$ are also Boolean formulae. You will not build a SAT solver to solve an arbitrary Boolean formula. On the contrary, we will restrict our attention to formulae in **conjuctive normal form** (CNF). Such formulae are *conjunctions* of *disjunctions*. For example, $$(x_1 \lor \lnot x_2 \lor x_3) \land (\lnot x_1 \lor x_2 \lor x_4) \land (x_2 \lor \lnot x_3 \lor \lnot x_4) \land (x_1)$$ $$(A \lor \lnot B \lor C) \land (\lnot A \lor B) \land (B \lor \lnot C) \land (A)$$ is a Boolean formula in CNF. Note that you will not be restricting the power of your SAT solvers by building them to solve SAT for CNF formulas only, because any Boolean formula can be equivalently expressed in CNF via a [mechanical process](https://en.wikipedia.org/wiki/Tseytin_transformation). :::warning **Historic Note:** Following Stephen Cook's seminal work on SAT, Richard Karp (1972) reduced SAT for formulae in *conjunctive normal form* to integer programming, clique, and other problems to establish that solving these problems is at least as hard as solving SAT. Since these problems are also in the class NP, they are NP-complete. With each new problem shown to be NP-complete, new opportunities for reductions arise: e.g., by reducing clique to vertex cover, the latter was also shown (by Karp) to be at least as hard as solving SAT (and in the class NP; hence, NP-complete). ::: CNF is very convenient to represent in a program, e.g., as a list of lists. The outer list comprises **clauses**, which are the disjunctions in the formula, while the inner list comprises **literals**. Literals are variables or their negation. In our example formula, there are five literals, namely $A$, $\neg A$, $B$, $C$, and $\neg C$, and four clauses. In this assignment, you will exploit two additional features of CNF formulae: 1. If any one clause in a CNF formula is unsatisfiable, the formula is unsatisfiable. :::spoiler Why? Because a conjunction of clauses is only `True` when all the clauses are `True`. ::: 2. For a clause to evaluate to `True`, only one literal need evaluate to `True`. Therefore, assigning a single variable a value can satisfy a clause independently of the values of the other variables in that clause. The search for a satisfying assignment for a Boolean formulae is a search over **models**. A model is assignment of variables to values. The formula $P \to Q$ is satisfiable. Indeed, it evaluates to `True` in all models, except the model in which $P$ is assigned the value `True` and $Q$ is assigned the value `False`. | $P$ | $Q$ | $P \to Q$ | |-----|-----|-----------| | T | T | T | | T | F | F | | F | T | T | | F | F | T | ## Data Structures In `sat_problem.py`, we represent `Variable`, `Literal`, and `Clause` as classes. A `Variable` has a name (string). Two Variables with the same name are assumed to be the same variable, while two variables with different names are not the same variable. The `Literal` class is constructed from a `Variable` and an `is_positive` property. Finally, a `Clause` is simply a set of type `Literal`. These classes also provide some convenient methods for printing variables, literals, and clauses so that formulae are human readable. :::spoiler Why do clauses comprise only literals? What happened to the logical operators? Since our formulae are assumed to be in CNF, we know that literals are "OR"-ed together and clauses are "AND"-ed together. Thus, we need only store the literals for each clause! ::: In `sat_problem.py`, we also provide a `CNF` class, which represents Boolean formulae in CNF. A `CNF` object has two attributes: `clauses`, a list of type `Clause`, and `variables`, a dictionary that maps each variable in a `CNF` to the clauses that the variable appears in. The `CNF` class also has an `add_clause(Clause)` method that adds a clause to the CNF formula, i.e., it appends the clause to `clauses` and updates `variables`. Finally, in `sat_problem.py`, we also provide a `Model` class, which stores an assignment (i.e., a dictionary) of variables to values for a `CNF`. A variable can either be `True` (represented by a 1), `False` (represented by a 0), or unassigned, which is represented by -1. In addition to assignments, the `Model` class also provides other useful methods, including `is_satisfied`, which indicates whether the `Model` satisfies its `CNF`. ## SAT Solving Algorithms Although SAT is a decision problem, in this assignment, we formulate SAT--CNF, specifically--as an optimization problem, and aim to solve it using local search. We do this by defining as our objective, "the number of unsatisfied clauses". Our goal is to minimize this objective, and if ever we succeed at reaching the lower bound, namely 0, we have found a satisfying assignment, as no unsatisfied clauses remain. You will be implementing two local search algorithms in this assignment to solve SAT: GSAT (Greedy-SAT) and WalkSAT. These algorithms share many similarities. They both initialize a model at random, meaning they assign values at random to all the variables. They then select variables one at a time, and flip those variables' values. The algorithms differ only in how they select the next variable to flip: 1. GSAT selects the variable which, if flipped, would satisfy the most clauses. 1. WalkSAT first selects a random unsatisfied clause, and then selects the variable within that clause which, if flipped, would satisfy the most clauses. At each iteration of GSAT/WalkSAT, a random variable may be selected, rather than selecting the best variable. The probability of selecting a random variable is a parameter of the algorithms, namely `p`. ## The Puzzles Recall the $n$-queens puzzle presented in class. The challenge is to place $n$ queens on an $n \times n$ chessboard so that no two queens threaten each other. For all $n > 3$, this problem has a solution, and in some cases, there are multiple solutions. But these solutions are not always easy to find. In the file `n_queens.py`, and the class `NQueens`, we provide an implementation of the $n$-queens puzzle. The `NQueens` class inherits from CNF, because it encodes the $n$-queens constraints as a CNF formula. You can use our implementation of `NQueens` to test your implementations of GSAT and WalkSAT. **Sudoku** is another classic puzzle. In Sudoku, the goal is to fill in a 9x9 grid with digits so that each column, each row, and each of the nine 3x3 subgrids all contain all of the digits from 1 to 9. A starting configuration is usually given, with a few of the grid cells already filled in. This initial configuration cannot be altered. In the figure below, a starting configuration appears on the left, and a solution, on the right.  ![Screen Shot 2024-08-24 at 10.22.09 AM](https://hackmd.io/_uploads/BkDWguwoR.png)  Humans have developed a number of [techniques and strategies](https://www.conceptispuzzles.com/index.aspx?uri=puzzle/sudoku/techniques) that they apply to solve Sudoku puzzles. We could implement some of these strategies, and solve Sudoku with search, but doing so would rely on domain specific knowledge. Instead, you solve Sudoku in two steps: first, you will translate the puzzle's constraints into a CNF formula, and then you will solve this CNF using a SAT solver. ## Tasks :::info **Task 1a** Implement WalkSAT and GSAT in `sat_solvers.py`. ::: :::success **Signatures** The `gsat` and `walksat` functions both have the same signature. They take as input a `cnf_formula` for which they attempt to construct a `Model`. They return said `Model` if one is found, and `None`, otherwise. They also return a dictionary of statistics `stats`. This dictionary contains `num_iterations`, `num_unsatisfied_clauses` (per iteration), and `run_time`. The algorithms also take as input `max_iterations`, which indicates the maximum number of values to flip before giving up, and `p`, which indicates the probability of choosing a random variable to flip during each iteration. Additionally, the algorithms take a parameter `num_randomized_restarts`, which you can ignore until Task 3 of this assignment. Both algorithms can take in an `existing_model`. If an existing `Model` is provided (i.e., assignments for variables), the algorithms should start from this model rather than a random assignment (we have done this for you). See `unit_tests.py` for an example test using an existing model. ::: :::spoiler **Hint: Helper Functions** The `Model` class has many useful helper functions, including: * `get_unsatisfied_clauses()`, which returns all current unsatisfied clauses * `change_in_num_clauses_sat_if_flipped(var)`, which returns the change in number of clauses that are satisfied if `var` is flipped ::: :::info **Task 1b** Run `n_queens.py` by calling, for example, `python n_queens.py -n 8 -p 0.4` in your terminal. The parameter $n$ is the size of the board, and $p$, the exploration probability. ::: These local search algorithms should now be able to solve the $8$-Queens puzzle! Your next task is to translate the Sudoku rules into a CNF formula, and then to code up your formula. Since Sudoku is a constraint satisfaction problem, you should start by choosing variables and their domains. Perhaps the most natural choice is to create a variable for every grid cell, each with domain $\{ 1, \ldots, 9 \}$. For example, setting the origin as the top left cell in the example above, and indexing by row first and then column, $x_{1,1}=4, x_{1,2}=5, x_{1,3}=3$, and so on. The trouble with this representation is that our aim is to represent Sudoku using a Boolean formula, and a Boolean formula is comprised of *Boolean* variables, not integer variables. However, because these integer variables are bounded (here, they lie in the range 1 to 9), we can convert them to boolean variables by creating a boolean variable for each value that each integer variable can be assigned. Our boolean variables are thus $x_{i,j,v}$, where $i$ and $j$ range over rows and columns in the grid, as above, and $v$ ranges over $\{ 1, \ldots, 9 \}$. Boolean variable $x_{i, j, v}$ is `True` if and only if the value at cell $(i, j)$ is $v$. :::info **Task 2a** Implement the `compile` method in `sudoku.py`, which automates the construction of a CNF formula, given a Sudoku problem. How many Boolean variables are required, assuming an empty Sudoku board initial configuration? Why? Give your answer in your README, in terms of the size of the grid, $n$, and the number of digits, $m = 9$. ::: :::warning **Hint:** You may find it helpful to write out the mathematical formulation on paper before translating it into code. ::: As usual, your Sudoku implementation must work on grids of varying sizes (e.g., 4x4, 9x9, 16x16, etc.). As in the case of Tile Games and Tic-Tac-Toe, you will have a *much* easier time testing your Sudoku solver on a 4x4 grid than on a 9x9 grid. :::warning **Hint:** The Sudoku constructor sets `self.grid_size` and `self.subgrid_size` for you. You should use these variables in your formulation. ::: :::info **Task 2a** Run `python sudoku.py --difficulty easy`. This command will invoke GSAT and WalkSAT on your Sudoku formulation, on an "easy" Sudoku problem (a 4x4 grid, in this case). Try out your algorithm with `medium` and `hard` problems as well. When attempting harder problems, you may need to adjust your the number of iterations your algorithm runs for: `python sudoku.py --difficulty medium --max_iterations 100000` ::: :::warning **Important Note:** If GSAT and/or WalkSAT finds a solution for your formulation, that does not mean that your formulation is correct! It just means that a satisfying assignment of variables was possible for your formulation, whatever it may be. So, be sure to add tests to `unit_tests.py` to check whether your solutions actually satisfy the Sudoku constraints! ::: Your solvers may occassionally succeed at solving 9x9 Sudoku problems, but more often than not, they will fail. Local search algorithms like GSAT and WalkSAT are not **complete**: they are not guaranteed to find a solution, even when one exists. Recall that we posed CNF as optimization problem with an objective function corresponding to the number of unsatisfied clauses. The value at a **global minimum** for a *satisfiable* formula is 0 unsatisfied clauses, but we don't know at the onset if a formula is satisfiable or not. Local search algorithms can get trapped in **local minima**. In CNF, a local minimum is an assignment of variables such that flipping the value of any single variable would yield more unsatisfied clauses than the current assignment. GSAT and WalkSAT try to escape local minima by choosing a variable at random, and flipping the value of that variable with probability $p$. While helpful, this form of exploration may not be enough. Another way to escape local minima is simply to start over! That is, to add **randomized restarts** to a local search algorithm. :::info **Task 3** Add randomized restarts to WalkSAT and GSAT. That is, wrap your implementations inside a loop that starts the algorithm anew `num_randomized_restarts` times. Each time the algorithm restarts, the assignment should be initialized at random. In this case, `max_iterations` is the number of iterations that each algorithm should perform after each restart. For example, if `max_iterations`$=1000$ and `num_random_restarts`$=50$, the algorithms would actually run for $1000*50$ iterations in total. ::: :::warning **Note:** Adding randomized restarts may help your algorithms solve harder puzzles. Try running `python sudoku.py --difficulty hard --solver walksat --randomized_restarts 100 --max_iterations 5000` to attempt a hard Sudoku puzzle with 100 randomized restarts. ::: :::spoiler **Hint: Statistics** Be sure that the `num_iterations` statistic is being updated correctly. It should count the *total* number of iterations, not just the number of iterations after a restart. ::: :::info **Task 4** Compare the performance of WalkSAT and GSAT on $n$-Queens and Sudoku. Run `python compare_performance.py --puzzle queens -n 8 --num_p 11` to run GSAT and WalkSAT on an 8-queens problem with 11 different values for $p$ (i.e. $p=0.0, 0.1, 0.2\ldots, 1.0$). By default, this runs one trial, but you can run additional trials using the `--num_trials` command line argument. Compare your results for 8-queens with our results for Sudoku presented below and **answer the conceptual questions listed beneath our results in your README.** ::: Our results on a simple 4x4 Sudoku board are shown below, for GSAT and WalkSAT in turn, and various values of $p$. We calculate the average success rate of the algorithms over 20 trials, as well run time and total iteration statistics. ### GSAT Results | $p$ | Run Time (Average) | Run Time (Std Dev.) | Iterations (Average) | Iterations (Std Dev.) | Success Rate | | --- | --- | --- | --- | --- | --- | | 0.00 | 8.34 | 0.23 | 50000 | 0 | 0.00 | | 0.10 | 6.53 | 1.81 | 44126 | 12224 | 0.20 | | 0.20 | 4.49 | 2.44 | 33663 | 18403 | 0.50 | | 0.30 | 1.14 | 1.44 | 9393 | 11943 | 0.95 | | 0.40 | 0.49 | 0.48 | 4419 | 4260 | 1.00 | | 0.50 | 0.25 | 0.27 | 2618 | 2897 | 1.00 | | 0.60 | 0.95 | 0.92 | 11496 | 11288 | 1.00 | | 0.70 | 3.20 | 0.66 | 47002 | 9339 | 0.10 | | 0.80 | 2.61 | 0.06 | 50000 | 0 | 0.00 | | 0.90 | 1.81 | 0.06 | 50000 | 0 | 0.00 | | 1.00 | 0.83 | 0.03 | 50000 | 0 | 0.00 | ### WalkSAT Results | $p$ | Run Time (Average) | Run Time (Std Dev.) | Iterations (Average) | Iterations (Std Dev.) | Success Rate | | --- | --- | --- | --- | --- | --- | | 0.00 | 0.60 | 0.49 | 30025 | 24464 | 0.40 | | 0.10 | 0.34 | 0.31 | 17060 | 16038 | 0.85 | | 0.20 | 0.07 | 0.08 | 3580 | 4160 | 1.00 | | 0.30 | 0.04 | 0.05 | 1899 | 2347 | 1.00 | | 0.40 | 0.02 | 0.02 | 1177 | 1070 | 1.00 | | 0.50 | 0.01 | 0.01 | 480 | 487 | 1.00 | | 0.60 | 0.01 | 0.00 | 397 | 251 | 1.00 | | 0.70 | 0.01 | 0.01 | 491 | 439 | 1.00 | | 0.80 | 0.01 | 0.02 | 720 | 903 | 1.00 | | 0.90 | 0.02 | 0.01 | 1447 | 790 | 1.00 | | 1.00 | 0.09 | 0.09 | 5552 | 5817 | 1.00 | :::info ### Conceptual Questions 1. It is surprising that WalkSAT can solve instances of Sudoku with $p=1.0$! GSAT and WalkSAT are both local search algorithms, but differ in how they define **neighborhood**. What is the neighborhood that each algorithm uses and why does WalkSAT's neighborhood choice lead to success even when $p=1.0$ (i.e., just flipping variables randomly). Add your answer to this question to your README. 2. Compare your results tables for the $n$-queens problem to our Sudoku results above. Is the same parameter setting (i.e., value of $p$) fastest for both problems? In general, what problem characteristics might make some settings of $p$ better than others? Can you think of a case where $p=0$ would be the best parameter setting? Add your answers to these questions to your README. 3. Your implementation of these SAT algorithms can solve 8-Queens quickly (hopefully). How long would you expect it to take to solve an instance of 20-Queens? What about 100-Queens? Design and then run a series of experiments to estimate how long a 100-Queens problem might take to solve using WalkSAT. Running experiments for every $p$ value is time consuming, so think about ways to go about picking a single $p$ value to run experiments with. Describe your experimental design, any results you collect, and your how you arrived at your estimate in your README. ::: ::: warning **Note**: We care more about your experimental design and how you reach your prediction rather than your actual prediction (which is machine dependent). ::: :::danger **Important**: Your experiments shouldn't take more than 10 minutes to run -- for your own sake! :::  ## Downloads Please click [here](https://classroom.github.com/a/ZqmrUNvW) to access the assignment code. ### Support Code - `sat_problem.py`: Provides classes required for building CNF formula, as well as a `Model` class for representing potential satisfying assignments for a `CNF`. - `vizualize.py`: Contains visualization functions for the Sudoku and $n$-queens puzzles. These visualizations should help you better understand the solutions provided by your SAT solvers. - `n_queens.py`: Contains our formulation of `NQueens`, which extends `CNF` with the relevant clauses. You can use this file to test your SAT solvers, and as a model for how to implement Sudoku as a CNF. ### Stencil Code - `sat_solvers.py`: This is where you will implement the GSAT and WalkSAT algorithms. - `sudoku.py`: Contains the `sudoku` class, which extends `CNF`. You should implement the `compile` method in this class, which adds the appropriate clauses to the CNF formula it represents. :::warning **Reminder:** Your solution should modify the stencil code *only*. You will be penalized for modifying the support code, especially if it interferes with the autograder. ::: ### Testing Code - `unit_tests.py`: A file in which to write unit tests for your code. We have created a small CNF and provided two simple tests: the first tests whether WalkSAT picks the best variable to flip (by creating a senario which WalkSAT *should* pick the variable B), and the second tests whether GSAT finds a correct assignment. You should run our tests, but you should also create your own CNFs to test on by instantiating a `CNF` and using `add_clause()` to populate your CNF object. - `examples.py`: A file with some Sudoku puzzles you can use to test your algorithms. You should add more tests to further verify the correctness of your code. Does it succeed at finding solutions when they exist? And does it terminate gracefully when they do not? ## Submission ### Handin Your handin should contain the following: - all modified files, including comments describing the logic of your implementations, and your tests - the graphs you generated - and, as always, a README containing: - a brief paragraph summarizing your implementation and highlighting the outcomes of all tests - your solutions to any conceptual questions - known problems in the code - anyone you worked with - any outside resources used (eg. Stack Overflow, ChatGPT) ### Gradescope Submit your assignment via Gradescope. To submit through GitHub, use this sequence of commands: 1. `git add -A` 2. `git commit -m "commit message"` 3. `git push` Now, you are ready to upload your repo to Gradescope. :::danger **⚠️WARNING⚠️** Make sure you have the correct assignment repo and branch selected before submitting. ::: *Tip*: If you are having difficulties submitting through GitHub, you may submit by zipping up your hw folder. ### Rubric | Component | Points | Notes | |-------------------|----|--------------------------------| | GSAT/WalkSAT implementations | 20 | Points awarded for correct implementations of WalkSAT and GSAT. | | Sudoku Formulation | 20 | Points awarded for correct implementation of Sudoku as a CNF formula and for correctly calculating the number of variables used as a function of grid size.| | Randomized Restarts | 10 | Points awarded for correct implementation of randomized restarts.| | Question 4.1 | 5 | Points awarded for identifying difference in neighborhood between WalkSAT and GSAT, and the effect of neighborhood choice on performance. | | Question 4.2 | 5 | Points awarded for comparison of 8-queens to Sudoku and answering the relevant questions. | | Question 4.3 | 20 | Points awarded for a thought out and well-explained experimental design, as well as including the data from your experiments and predicted time to run 100-queens. | | Tests | 10 | Points awarded for additional unit tests in `unit_tests.py`, including a test to ensure that Sudoku solutions discovered by your SAT solvers are actual Sudoku solutions. | | README | 10 | Points awarded for an easy to read and well-organized README file. | :::success Congrats on submitting your homework; Steve is proud of you!! ![image](https://hackmd.io/_uploads/Hyjj8jk3A.png) ![image](https://hackmd.io/_uploads/S1OQ2aCwA.png) :::