Assignment 8: SAT Solver

Assignment 8: SAT Solver ------ # Learning outcomes :::info In this assignment, you will: - Implement your own simple SAT solver! - Solve an NP-complete problem with a backtracking search algorithm. - Meet the specification for an external standard: the SAT competition standard, which is based a simplification of the DIMACS CNF format. - Practice choosing your own data structures to solve a programming problem, including reasoning about mutation. ::: This is a **language-agnostic programming assignment**. :::warning Note: this is a **solo**, not pair, assignment. ::: # Setup ## Programming language You may use any programming language to complete this assignment, with instructor permission. Permission is granted by default for two programming languages: Python and Rust. Because Rust is a bit more of a challenge, completing the assignment in Rust will add up to +10 extra credit points on the assignment. You can run this assignment locally, on your own machine, but it will be tested in a UNIX-based environment by the Gradescope autograder. Make sure to try uploading your implemention to the autograder earler, to make sure you are able to read/write data in the correct format (more on this in the input/output instructions). ## Minimal stencil files The autograder runs your solver by running a `run.sh` script (so that your solver can be implemented in either Python or Rust). The `run.sh` script assumes a certain filename format. :::success When you are ready to start your implementation, start your solver with the `run.sh` script and minimal stencil files for the language of your choice found [here](https://drive.google.com/drive/folders/1NSRZcclyXxZfXaypBbt52BlqzxkgENMO?usp=sharing). ::: # The SAT problem As we have discussed in class, Alloy (and many other tools) use underlying *solvers* to analyze models of useful systems. We'll explore this in much more depth next week. One of the most foundational types of solver is a boolean satisfiability solver, also known as a SAT solver. The [boolean satisfiability problem](https://en.wikipedia.org/wiki/Boolean_satisfiability_problem) was the first problem shown to be NP-complete, and it has enormous significance across both the theoretical computer science and practical applications. In short, the boolean satisfiability problem is this: :::info **The SAT problem**: For a given boolean formula, does there exist an assignment of true and false to the variables such that the formula evaluates to true? ::: Boolean *formulas* are composed of a finite number of named variables combined with `and`, `or`, and `not`. (More complicated logical operators, such as `implies`, can be reduced to these 3 core operators, as we saw earlier in CS340). # Conjunctive Normal Form (CNF) For both algorithmic reasons and to facilitate reuse across solvers, most SAT solver take in boolean formulas in the specific format we covered in class: conjunctive normal form, or CNF. In CNF, a formula is made up of a list of *clauses* of *literals*. 1. A **literal** is a boolean variable (e.g., `x` or it's negation `not x`). 2. A **clause** is a list of literals combined with `or` (e.g., `(x or y or not z)`). Mathematically, this is called a disjunction. 3. The top-level **formula** in CNF is a list of clauses combined with `and` (e.g., `(x) and (x or y or not z)`). Mathematically, this is called a conjunction. # SAT competition format There is an annual competition to compare SAT solvers, which specifies an even more specific CNF to make it easier to parse inputs and compare outputs. ## Input In addition to rules 1-3 above, the [SAT competition format (sections 4.1 and 5.2)](https://jix.github.io/varisat/manual/0.2.0/formats/dimacs.html#:~:text=The%20DIMACS%20CNF%20format%20is,a%20negation%20of%20a%20variable.) also specifies how a formula is encoded as lines of text in a file. Specifically, the format your SAT solver should expect for its input is as follows. Your solver should take in a single filename as its argument. The autograder will only test against example files that meet this format; you do not need to do extensive error checking and can assume the following: 4. Variables are integers. A positive number $i$ indicates the literal is the positive form of the variable, a negative number $-i$ indicates the literal is negated (`not i` ). For example, a `7` in a clause refers to the 7th variable. A `-9` in a clause refers to the negation of the 9th variable. 5. The file starts with 0 or more comment lines that begin with `c` followed by a space. Your solver should ignore these lines. 6. The first non-comment line should have the form `p cnf <numvar> <numclauses>`, where `<numvar>` is the number of variables and `<numclauses>` is the number of clauses. 7. After the first line, each subsequent line represents a clause. Each clause is a sequence of distinct non-null numbers. Positive numbers indicate the corresponding variable. Negative numbers indicates the negation of the corresponding variable. A clause will not contain the opposite literals `i` and `−i` for a given variable. Each literal is separates by a space, and the clause ends with a `0` to indicates the end of the clause. The `0` technically allows clauses to span multiple lines, but the autograder will only test with each clause on exactly one line. ### Input CNF example ``` p cnf 4 3 -1 0 1 3 -2 0 -1 4 0 ``` `p cnf 4 3` indicates that there are 4 variables and 3 clauses. We can read this as describing the formula: - the negation of the first variable; `and` - the first variable `or` the third variable `or` the negation of the second variable; `and` - the negation of the first variable `or` the fourth variable. Or, equivalently using variables `a`, `b`, `c`, `d`: `(!a) and (a or c or !b) and (!a or d)` Or: $$ (\neg a) \wedge (a \vee c \vee \neg b) \wedge (\neg a \vee d) $$ You can use this [online CNF SAT solver](https://www.comp.nus.edu.sg/~gregory/sat/) to quickly check if an input is correctly formatted. ## Output The SAT competition format specifies the following for the output (written to standard out): 1. The first line is the solution line, starting with an `s`. It is either: - `s SATISFIABLE` - This indicates that the formula is satisfiable. - `s UNSATISFIABLE` - This indicates that the formula is unsatisfiable, e.g., no assignment that satisfies the formula exists. 2. If the formula is satisfiable, the following lines indicates the value mapping. Each starts with a `v`, then a space, then a space-separated list of literals, with positive numbers indicating a true assignment and negative numbers indicating a false assignment. The final value line is terminated with a `0`. ### Output example A correct solution could output the following for the input example. Note the exact literals chosen may differ, since there can be multiple valid assignments for a given problem. ``` s SATISFIABLE v -1 -2 3 -4 0 ``` :::success When you submit to Gradescope, the autograder will report whether your submission passes two basic tests. You can do this early on to check that you are reading the input and printing the output correctly (for example, if your code parses the input and always says `UNSAT`, you should expect one test to pass and one test to fail). ::: # Your task: Solver Your task for this assignment is to implement a simplified version of the classic [DPLL algorithm]() for SAT solving. :::info The two cores idea you will implement are: 1. Unit propagation (finding literals on their own and using them to simplify other clauses). 2. Backtracking search (choosing a variable and guessing true or false; trying the other option if the first fails). ::: ## Unit propagation As we discussed in class, unit propagation captures the same idea we as humans likely use when trying to find a solution to a SAT problem. If some clause has only a single literal, then we know that the final assignment must assign that variable the value of the variable. The intuition is that if we have `x and <anything>`, we know `x` must be assigned $true$; if we have `!x and <anything>`, we know `x` must be assigned $false$. The *propagation* component is that we can use this knowledge of a single variable's value to simplify other clauses. As we saw in class, there are two cases: 1. **Case 1**: Any clause that contains the literal, with **the same sign**, is now also satisfied (because we have committed to that literal in our mapping). We can remove the entire clause (or otherwise mark it as satisfied, depending on how you structure your solution). 2. **Case 2**: Any cause that contains the literal, with **the opposite sign**, will not have this literal be the reason it is satisfied (because we have committed to the *opposite* assignment of the literal in our mapping). We can remove the negated literal from the clause (potentially creating an empty clause, depending on how you structure your solution). :::info It is up to you how exact to model changes to the boolean formula. You can either construct a new boolean formula as the result of unit propagation, or mutate the existing formula, as long as you are using a data structure design that enables the previous formula to still be recoverable during the backtracking search. ::: Some possible pseudocode: ```java! unit_propagate(formula, mapping): for each unit clause {+/-x} in formula: add {+/-x} to mapping // Case 1 // Same sign only! remove all clauses containing +/-x // Case 2 // Flipped sign only! remove all instances of -/+x in every clause return formula ``` Note that this step may remove variables that are not needed to make the entire formula true. It's up to you whether your solution tracks all variables at the start, or tracks the potentially-removed variables here. ## Backtracking search The DPLL implements backtracking search via a recursive algorithm. In each recursive call, DPLL first propagates all unit clauses. After that, one of the following must be true: 1. No clauses remain. Because DPLL removes satisfied clauses, this means the search has succeeded with a successful SAT result. 2. The empty clause exists in the list of clauses. Because DPLL removes unhelpful literals to a clause, this means it derived a contradiction, which means this path is UNSAT. 3. Neither of the above cases hold, which means DPLL must make a guess; then potentially *backtrack* if the guess was not successful. ## Full algorithm Possible pseudocode for the recursive backtracking is included below. You do not need to follow this exact pseudocode, but your submission should include both unit propagation and backtracking search. :::warning You are free to look up other pseudocode for DPLL online. However, for academic integrity, please **do not** reference other *full implementations* of DPLL. You are free to use libraries to aid in your implementation, but please do not use any libraries designed for SAT solving (since that is the point of this assignment!). ::: ```java! // Inputs: formula (list of clauses), assignments (mapping) dpll(formula, assignments): while formula contains unit clause: // Unit propagate potentially updates assignments and the formula formula = unit_propagate(formula, assignments) if no clauses remain: // Since we remove satisfied clauses, this means we are done and have a SAT // result! :) return SAT, assignments if formula has empty clause: // Since we remove unhelpful literals to a clause, this means we must have // hit a contradiction, so this path in the search is UNSAT :( return UNSAT // You can use any strategy you'd like to pick a variable to split on x = pick_variable(formula) // Guess that the variable is true // Note: use a copy of assignments because we may need to backtrack and // forget the assignments used during this solve attempt asgn_copy = assignments + {x : true} true_result = dpll(formula + [x], asgn_copy) // If our guess worked out, then we are done! :) if true_result is SAT return true_result // If our guess did not work out (true_result is UNSAT), try making the // opposite guess for the same variable // Guess that the variable is false asgn_copy = assignments + {x : false} false_result = dpll(formula + [!x], asgn_copy) // There is no other possible value for x, so return whatever we have, // regardless of whether it is SAT or UNSAT return false_result } // Be sure to also pick arbritrary values for variables not needed in the final // assignments. ``` ## Optimization: pure elimination [Independent] The full DPLL algorithm implements an additional strategy, *pure elimination*. A literal is "pure" if, whenever it appears in a clause, it always has the same sign. In that case, DPLL commits to that sign in the mapping then simplifies the formula based on that decision. Implementing pure elimination is worth 5 assignment points. You should only add pure elimination once your basic algorithm works as expected. # Examples (:star: workshop) Here are a few examples that walk through the simplified DPLL algorithm. We'll walk through these in groups together in workshop. ## Example 1 If we had the following input: ``` p cnf 5 4 -1 0 1 3 -2 0 1 5 0 4 -5 0 ``` :::success First, draw a representation of the list of clauses and the initial mapping for this formula. Once your group has your answer, check it below. ::: <details> <summary>Expand once you have your own version of `formula` and `assignments` for this example </summary> One possible way to draw the formula (you will likely have a more complicated data structure for this in your actual implementation!). We start with an empty assignment mapping. ``` Formula: [ [-1], [1, 3, -2] [1, 5] [4, -5] ] Assignment: {} ``` </details> :::success Next, draw new version of the formula and the mapping (don't erase your old version) after each round of unit propagation and each recursive call to the algorithm. Check your group's answers with Alexa, then expand to see one possible solution below. ::: <details> <summary>Expand once you've checked your version with Alexa.</summary> The first clause is a unit clause, so we commit to 1 being $false$ and propagates that knowledge. ``` Formula: [ [3, -2] [5] [4, -5] ] Assignment: { 1 : false, } ``` This created another unit clause for 5, so we propagate again: ``` Formula: [ [3, -2] [4] ] Assignment: { 1 : false, 5: true, } ``` This created another unit clause for 5, so we propagate again: ``` Formula: [ [3, -2] ] Assignment: { 1 : false, 5: true, 4: true, } ``` Now, there are no more unit literals, but we have neither an empty list of clauses nor have we reached the empty clause (a contradiction). So, we make a guess: let's say 3 is $true$. Here is one way to model that: ``` Formula: [ [3] [3, -2] ] Assignment: { 1 : false, 5: true, 4: true, 3: true, } ``` Which gets simplified in the next call to: ``` Formula: [ ] Assignment: { 1 : false, 5: true, 4: true, 3: true, } ``` :::info We now hit the success case of an empty list of clauses! Since 2 has no assignment, we can pick any assignment for it, and return SAT with our assignment. ::: We should print: ``` s SATISFIABLE v -1 2 3 4 5 0 ``` We can sanity check that this makes sense by mentally checking that for each clause line in the input, at least one of its literals has the same sign in the solution as it does in that clause. </details> ## Example 2 Repeat the above exercise with the following input: ``` p cnf 2 4 1 2 0 -1 2 0 1 -2 0 -1 -2 0 ``` <details> <summary>Expand once you've checked your version with Alexa.</summary> We would start with: ``` Formula: [ [1, 2], [-1, 2] [1, -2] [-1, -2] ] Assignment: {} ``` We have no unit clauses, so we might guess 1 to be `true`: ``` Formula: [ [1] [1, 2], [-1, 2] [1, -2] [-1, -2] ] Assignment: { 1 : true, } ``` Which after unit prop results in first: ``` Formula: [ [2] [-2] ] Assignment: { 1 : true, } ``` Then this intermediate step: ``` Formula: [ [2] [-2] ] Assignment: { 1 : true, 2 : true, } ``` Which results in: ``` Formula: [ [] ] Assignment: { 1 : true, 2 : true, } ``` :::danger We have an empty clause! Note that this is different from having no clauses; the empty clause is unsatisfiable. Hence our guess was wrong. We now need to backtrack and assign 1 to be false. ::: However, we also hit the empty clause in the false guess, which means the entire formula is truly unsatisfiable. We should then print: ``` s UNSATISFIABLE ``` </details> # Design :::success In your README, document the following: **Written Q1:** Describe the purpose and design of your solver, in your own words, as if this were a publicly accessible project. (You do not need to document the input/output format beyond stating that it is the SAT competition format). **Written Q2:** What data structures did you use to represent the formula and the assignment mapping? Did you mutate each, or return new copies for unit propagation and each recursive call? **Written Q3:** What strategy does your solver take for choosing which variable and assignment to guess? ::: # Testing :::success Create at least 5 new expected SAT formulas and 5 new expected UNSAT formulas to test your solver. These should be *in addition* to the examples given here. ::: You can add tests either directly as your own formula representation, as strings in the specified format, or as new files in the specified format. You can use Alloy to help you generate some (but not all) of your tests (search the documentation for exporting to CNF). Implement automatic testing such that your tests are all run with a single command line call (this can be accomplished by either adding a new executable file that calls functions from your main file, or by using a unit testing library, or by adding a script that passes each file into your solver). :::success Answer the following in your README. **Written Q4:** Give one command line call to execute that runs your tests. Describe a few examples of what different tests are checking for. ::: :::warning Note that you will likely not receive extra credit if your baseline solver does not pass the correctness tests. ::: # Extra credit: property based testing For up to 10 points of extra credit, implement a randomized test generator for your solver. You can either use your own generator, or use a PBT library such as [Hypothesis](https://hypothesis.readthedocs.io/en/latest/). # Extra credit: parrallel SAT solver For up to 10 points of extra credit, implement a concurrent version of your SAT solver. Add a new script, `run_par.sh`, that takes in an additional argument `--n` for the number of parallel threads to run. How exactly to parallelize your implementation is up to you. # Grading :::success In your README, also report: **Written Q5:** Collaborators you discussed or debugged with? **Written Q6:** Roughly how long did this assignment take you (in hours)? **(Optional) Written Q7:** If you chose to implement any extra credit option, describe your additions here. ::: This assignment is worth 100 points, broken down by: :::info 60 points: Solver correctness - Autograded. - Correct SAT/UNSAT result and a valid satisfying assignments if SAT. 15 points: Design of your solver 5 points: Pure elimination 10 points: Solver test suite - At least 5 new SAT and 5 new UNSAT tests. 10 points: `README` ::: # Submission Submit the following files on Gradescope: `run.sh`, `README.{txt, md}`, and the file(s) that implement your solution. # Attribution This assignment is based on a previous version of the [SAT][] assignment from [Tim Nelson's][tim] [Logic for Systems][lfs] at Brown University. [SAT]: https://docs.google.com/document/u/3/d/e/2PACX-1vTnmEUqfIVgoNYMqxtSmCrG_9p7G5WPS4RhOnO3xFe5J9ZFXHHX_0mZnGiPGeSZDLJasDE20HDjRZYL/pub [tim]: https://cs.brown.edu/~tbn/ [lfs]: https://csci1710.github.io/2023/