13: SAT/Solving S25

:::warning Monday note: - The first half of the lecture notes can be found at the end of the "Concurrency" notes. - Reminder to complete [this reading](http://www.cs.toronto.edu/~chechik/courses18/csc410/p76-malik.pdf) for Thursday. ::: In today's class, we'll zoom out and think about some bigger questions around what Alloy can do. # Alloy internals We've alluded to that Alloy uses an underlying "SAT solver" to find satisfying instances (or tell you no instances can be found). # Decidability If you have taken CS235, you've heard about _decidability_. A problem is **decidable** if there exists an algorithm to solve it that is always returns the correct answer and always terminates (that is, does not infinite loop). We'll briefly relate this to what we've been doing so far in modeling with Alloy. Let's start with this question: :::success For a given Alloy model, will a `run` or `check` statement always provide an answer, assuming enough computing resources and time? ::: <details> <summary>Think, then click! </summary> Yes, **because of the finite scope**! With a finite scope, check and run statements are _decidable_. Because the scope is finite, even a very inefficient implementation of the Alloy Analyzer could just enumerate all possible solutions to check if there is or is not a satisying instance to the model in question. We'll come to see how Alloy's actual implementation is more efficient than this, but the "brute force" argument can be used to informally show decideability. Even `run`/`check` statements for temporal models in Alloy, which reason over infinite traces, are deciable because Alloy is restricted to lasso traces--which have a finite number of states to analyze. However, *without* the finite scope, Alloy models are not necessarily decidable! It is possible to formulate models in Alloy such that an analysis might potentially infinite loop in searching for a satisyfing instance, which makes the problem of model-finding *undecidable*. </details> ## The halting problem The cannonical example of an undecidable problem is *the halting problem*. The formulation of the problem is this: can you write an algorithm that determines, for the source code of a program and an input to the program, whether the program will terminate (e.g., return) or run forever? There is a classic proof (that we won't go over in CS340) that shows that the halting problem is undecideable, that is: no such algorithm exists. Many modeling tools that are not bounded run into there types of decidability issues. # From relations to SAT solving We've alluded to that Alloy uses an underlying "SAT solver" to find satisfying instances (or tell you no instances can be found). The problem of finding an instance is in terms of atoms, sets, and relations; as shown in the analyzer when we have an instance. In contrast, the problem of boolean satisfiability is this: given a formula of _boolean_ inputs (`a`, `b`, `c`, etc) that are combined with `and`, `or`, and `not`, is there some assignment of `true`/`false` to each input such that the formula is true? Somehow, Alloy needs to bridge this gap between sets and relations and pure booleans. --- Consider this simple model in Alloy: ```java abstract sig Node { } one sig A, B, C extends Node {} run {} for exactly 3 Node ``` :::success How many instances do you expect? ::: <details> <summary>Think, then click! </summary> One, since there is one of each node (this model has no actual options). When we run the analyzer, we see: ```0 vars. 0 primary vars. 9ms.`` The part we will focus on today is the `0 primary vars`. This corresponds to the idea that there are 0 options, or choices, the analyzer needs to consider in this model. </details> ` What if we add some more optionality to the model? ```java abstract sig Node { edges: set Node } run {some edges} for exactly 3 Node ``` :::success Now, how many instances do you expect? ::: <details> <summary>Think, then click! </summary> There are 9 potential pairs of nodes in `edges` (3 possible outgoing edges for 3 total nodes). 9 choices of things that may, or may not, be the case in the world. Nothing else. Each edge choice is a boolean, true or false. Like in CS240, we can reason over strings of booleans. Thus, there are $2^9 = 512$ potential instances. (Note: if we did not have named nodes, Alloy would do some *symmetry breaking* to show fewer instances, we'll discuss this later in the course). </details> If we run Alloy on this model, we'll see: ``` 13 vars. 9 primary vars. 2 clauses. 18ms. ``` Again, the primary variables correspond to these atomic truths or choices in the model. In this case is, these are just which edges exist in our fixed, 3-node world. We can chance the model to increase the number of variables, for example, by allowing a potential 4th node. For example: ```java sig Node { edges: disj set Node } one sig A, B, C extends Node {} run {some edges} for 4 Node ``` :::success Now, how many primary variables do we expect? ::: <details> <summary>Think, then click! </summary> There is another potential Node in the world; the world may be either size 3 or 4. Whether or not this fourth Node exists is 1 new Boolean variable. And since there are 4 potential nodes in the world, there are now 4^2 = 16 potential edges. Note we need variables for all 16 possible edges. The total number of primary variables is thus 1+16 = 17. </details> --- # From Alloy models to Boolean formulas Once we know the set of Boolean variables we'll use, we can translate Alloy constraints to purely Boolean ones via substitution. The actual Alloy Analyzer is more complex than this, but here's an example of how a basic one might work. Suppose we have the constraint: ``` all n: Node | A in n.edges ``` :::success There are no `all` quantifiers in Boolean logic. How can we get rid of it? ::: <details> <summary>Think, then click! </summary> An `all` is just a big `and` over the nodes in the instance. ``` A in A.edges A in B.edges A in C.edges (Node4 in Node) implies A in Node4.edges // Implies can be translated to and, or, not // (p => q) is the same as (not p or q) ``` </details> There are similar rules for other operators. # Solving SAT :::success Is the SAT problem decidable? How do you know? ::: <details> <summary>Think, then click! </summary> Yes, it is decidable because we have a finite number of variables. We can describe a brute force algorithm to solve the problem, which always terminates because it loops over $2^n$ for a finite number of $n$ variables. </details> ## Strategy: brute force with truth tables We could have a brute force solution that conceptually builds the full truth table for the formula. If we have $n$ variables, the truth table has $2^n$ rows. We know the answer is $SAT$ if there is some row with output $true$ (with the assignment given by that row); otherwise the answer is $UNSAT$. However, this brute force strategy is always exponential in terms of both memory an runtime. Can we do better? To be continued! :::warning Thursday class notes continue below. ::: # Reading discussion In class, we discussed [Boolean Satisfiability: from Theoretical Hardness to Practical Success](http://www.cs.toronto.edu/~chechik/courses18/csc410/p76-malik.pdf) ``` 1. The article makes a distinction between the general SAT problem and a more theoretical variant. Which of these do you think Alloy’s use case requires? Alloy needs the general one in order to show instances. Theoretical variant: easier to prove things about. Some application don't need (e.g., a security system that allows/denies access). 2. What is CNF? Why is it useful to SAT solvers? Conjunctive normal form: clauses made up of ors (of variables or negations), and-ed together. (..or.) and (..or.) and (..or.) "each/every clause must be satisfied" "within a clause, we have different options (one or more) for satisfying it" Easier to check if a clause is satisfied, algorithms based on this. If any single clause becomes a contradiction/always false, then we're done. 3. Why is the search space for SAT tree-shaped? Each variable can be true or false, so it lends itself to a binary-tree-shaped decision tree. 4. What are some industrial impacts of SAT solvers? Resolving software package dependencies, mathematical verification tools, AI planning, microprocessor verification, equivalence checkers. 5. Give two examples of situations where you could model a real world concept with mappings for both (1) variables and (2) constraints. One can directly from the paper, one should be new. - seating chart for preschoolers, who hates who, who sticks gum places - constraints: who can't sit by eachother - variables: if the student is in the seat, true - student passing/failing a certain class. Grades, attendance - CS111: pass a bunch of thresholds - variables: did you pass the threshold for the category - constraints: and - sequential decision making game - variables: each decision - constraints: sequential, and who would win ``` ## Recall: brute force with truth tables The brute force solution conceptually builds the full truth table for the formula. If we have $n$ variables, the truth table has $2^n$ rows. This brute force strategy is always exponential in terms of both memory and runtime. Can we do better? ## A better strategy: intuition Suppose I asked you to solve a boolean constraint problem. Maybe it comes from Alloy or maybe it doesn't. Here's an example: `x1 and (x1 implies x2)` :::success Is there a satisfying instance for this constraint? - If yes, what is it, and how did you obtain it? - If no, how did you reach that conclusion? ::: <details> <summary>Think, then click! </summary> Yes, there is, `x1 = true` and `x2 = true`. The intuition is that because `x1` is on its own in a clause, we know it must be true (since `and` is strict). Once we know `x1` is true, we can use that knowledge to simplify the second clause, showing us `x2` must be true. </details> # More complex strategy If you've taken CS235, you know that SAT is *NP-complete*: it cannot be solved for arbitrary inputs without taking time exponential in the size of the input. However, that is true for arbitrary inputs, not necessarily common inputs! Plenty of real-world problems are worst case exponential (or even more difficult) and computer systems solve them all the time. We'll now consider an algorithm that can often do better than exponential on real-world problems. The "code" in today's notes is pseudocode–it shouldn't be viewed as complete–rather, the goal is to motivate the core ideas you'll implement in the next assignment. ## Unit propagation The idea we as humans used in the last example was to find a variable on its own in a clause, commit to that assignment of true or false, then propagate that knowledge. This is called *unit propagation*. Our example can be put into CNF as `x1 and (!x1 or x2)` by expanding the definition of `implies`. We can now more easily spot opportunities to propagate knowledge quite quickly, just by searching for clauses of only one element! A unit clause is a 1-element clause: a variable or its negation. The empty clause is a 0-element clause, and is equivalent to false. (A clause is a big "or", and an "or" gives a set of ways the formula could evaluate to True. But an empty clause gives no such options!) We can check for unit clauses in time linear in the number of clauses. Suppose we've identified a unit clause, in this case `x1`. Then, for every other clause in the set, we can check: 1. **Case 1**: Does the clause contain the same literal as the unit clause? If so, then that clause is subsumed by the unit clause: it must be satisfied! Delete the entire clause. 2. **Case 2**: Does the clause contain the opposite literal as in the unit clause? If so, then the clause cannot possibly be made true by that opposite literal. That is, our variable is unhelpful in making the clause true, so we delete that literal from the clause. # Combined algorithm: guessing + unit propagation Unit propagation can be very helpful, but it often only gets us for far. We might have formulas with no unit clauses. In that case, we have to fall back on the previous strategy: to brute-force by guessing! We need to be sure we still have a complete solver though, so if our first guess fails, we'll need to try again with the opposite guess. This makes our algorithm a *backtracking* search (because we backtrack on a wrong guess). :::success With the combination of unit propagation and backtracking search, is our best-case time still exponential? ::: <details> <summary>Think, then click! </summary> No, now, the best-case time is linear! We could guess correctly on every branch. The solver could be lucky: consider a formula like `x1 and x2 and x3 and x4 and x5`. If we always guess true, we only needs 5 recursive steps to return true; the brute force solution would need 2^5. </details> :star: Note this initial pseudocode is simplified somewhat from class. See the assignment for complete pseudocode. ```python def solve(formula: List<Clause>) -> bool: # are there any unit clauses? if so, propagate them # keep doing so until there are no more changes # Beware: mutation can be a source of bugs here... while formula changes: formula = unit_propagate(formula) # Did we produce the empty clause? if empty clause in formula: # This indicates a contradiction return False # Did we find a valid solution? else if formula is all consistent unit clauses: return True # Otherwise, we need to guess else: branch = remaining[0] # Guess that the variable is true true_result = solve(formula + {branch}) # Our guess worked! if true_result: return True # Our guess failed; BACKTRACK and try again else: # Guess that the variable is false false_result = solve(formula + {!branch}) return false_result ``` This idea–a recursive, backtracking search paired with unit propagation–is the foundation of one of the most famous boolean solver algorithms: [DPLL (named after the authors: Davis, Putnam, Logemann, and Loveland)](https://en.wikipedia.org/wiki/DPLL_algorithm). --- ## Improvements to SAT solving implementation Modern SAT solvers use many improvements beyond the core DPLL algorithm we have covered so far. ## Easy: Pure elimination :::success Do you spot any way we as humans could simplify this formula? ``` 1, -2, 4, 5, -6 1, 3, -4, 8 1, -7 -8 ``` ::: <details> <summary>Think, then click! </summary> Since the literal `1` has the same sign (positive) in every clause, we know it would not be helpful to set `1` for false, and we actually can set `1` to true then use that simplify the rest of the formula (in this case, that assignment alone is sufficient). </details> The core DPLL algorithm captures this idea as well, with a subroutine called pure elimination. A variable is **pure** if for every clause it appears in within the formula, the literal has the same sign (all positive or all negative). ``` pure_elim(F): for each variable x if +/-x is pure in F add a unit clause {+/-x} remove all clauses containing +/-x ``` Implementing pure elimination is worth 5 points on the SAT assignment, you should make sure your base algorithm works first. ### Harder: conflict-driven clause learning Modern, state-of-the-art SAT solvers use an additional strategy called [*conflict-driven clause learning*, or CDCL](https://en.wikipedia.org/wiki/Conflict-driven_clause_learning). We won't go into the details of CDCL or implement it, but the key idea is this: a solver can "learn" from conflicts in parts of the search space in order to avoid running into the same conflict multiple times. In CDCL, the core idea is to: 1. Unit propagate and pure eliminate as before. 2. Check for SAT/UNSAT as before. 3. Guess when needed. As you do so, build an implication graph about what is implied by the choice. 4. If you reach a conflict: - Find the point in the implication graph that led to the conflict (algorithmically, the "cut" in the graph). - Derive a new clause which is the negation of the assignments that led to the conflict. - Backtrack to that decision level (potentially, backtracking more aggressively than the basic DPLL algorithm. ) ### Harder: concurrent/distributed SAT solving :::success If you had a very large boolean formula to solve, how could you use multiple servers to help? ::: <details> <summary>Think, then click! </summary> Here's an over-simplified idea: divide and conquer based on some subset of the variables in the formula. If you have $k$ servers, pick $log_2\, k$ variables, and assign each combination of values (guess) for those variables to each server. This is a basic divide-an-conquer strategy, where we divide the search space up and assign each subspace to one server. </details> :::success With this strategy, how do you know the formula is SAT? ::: <details> <summary>Think, then click! </summary> As soon as any server returns SAT with an assignment, the entire formula must be SAT, so you can early return. </details> :::success With this strategy, how do you know the formula is UNSAT? ::: <details> <summary>Think, then click! </summary> You have to wait for every server to return UNSAT (since a valid assignment could be found with any initial guess). </details> Real systems for concurrent/distributed SAT solving go beyond basic divide-and-conquer to try to actually share information across the different workers. We won't go into this further in lecture, but this could make an interesting final project. # What the algorithm returns Returning just a boolean is not very useful for the caller. For example, if Alloy were using this solver, it would want the instance itself, not just that the instance exists. Every time we make a recursive call, there's an implicit set of assumptions involved. That is, there's always a partial instance of previously-selected guesses in effect at any point. We can make this explicit by adding a parameter to the function, and returning the guesses that produce a success. If our goal is to produce a total solver (and it usually is), we'll need to post-process the result and pick arbitrary values for variables that have no value in the assumption set. The pseudocode provided in the assignment will include an argument for the mapping itself. ## Getting to CNF form So far, we've been assuming our SAT solvers can always take in boolean formulas in CNF form. Is this a fair assumption? :::success Given an arbitrary formula, how do we convert it to CNF? ::: <details> <summary>Think, then click! </summary> A brute force strategy is to apply boolean algebraic laws, as seen in CS240: for example, the distributive law, DeMorgan's law, and the double negation law. This strategy works, but is worst case exponential in the size of the output formula (the repeated application of the distributive law can blow up the size of the formula.) </details> Real systems use versions of the [Tseitin Transformation](https://en.wikipedia.org/wiki/Tseytin_transformation) instead, which is linear in the size of the output formula. We won't go into the details of this transformation, but the intuition is that it adds new variables for subformulas to avoid an exponential blowup in the size of the output. # Additional details + applications ## Symmetry breaking in Alloy Going back to our initial Alloy example: ```java abstract sig Node { edges: one Node } one sig A, B, C extends Node {} run {} for exactly 3 Node ``` Consider what happens if we change the model such that the nodes `A`, `B`, and `C` are no longer named/individual sigs. The model then becomes: ```java sig Node { edges: one Node } run {} for exactly 3 Node ``` When we run this version, we only get 9 instances, instead of $2^9$ instances! The reason for this is that Alloy tries to avoid showing you the "same" instance multiple times. If atoms are unnamed and the constraints can never distinguish between different atoms them, instances will be considered "the same" if you can arrive at one by renaming elements of the other. Since the individual `Node` atoms are now anonymous/unnamed, we have far more equivalent instances. We call these instances *isomorphic* to each other, or symmetries of each other. Alloy will only show one instance per set of isomorphic instances (in most cases). Formally, we say that Alloy finds every instance "up to isomorphism". This is useful for: - increasing the quality of information you get from paging through instances; and - improving the runtime on unsatisfiable problems. Filtering out instances via symmetry breaking is an area of [active research](https://kaiyuanw.github.io/papers/paper22-tacas20.pdf). In talking to researchers who use Alloy in the field, symmetry breaking is a core concern for applying Alloy in practice. --- ## Handling integers and counting in boolean SAT :::success How does allow handle the `+` operator, for example, from our counter model: ``` always Counter.counter' = add[Counter.counter, 1] ``` if the solver only reasons over booleans? ::: <details> <summary>Think, then click! </summary> Like we saw in CS240, you can build a single-bit adder from `and`, `or`, and `not` gates. Alloy builds integer arithmetic out of basic boolean functions using this same type of logic. There is a deep connection between systems implementations and boolean logic! </details> ## More applications of SAT solving SAT solving is important from a theoretical perspective (complexity theory) as well as a practical one (applications spanning many subfields of computer science). Some primary applications include: 1. Software verification - Design model checking/finding, like Alloy. - Some forms of static analysis. - Bounded model checking on actual implementations (more next week). - Unbounded verification (Dafny, F*, etc). 2. Hardware design + verification - Verification and equivalence checking. - Example: layout of physicals compute elements within a chip design. 3. Planning/scheduling 8. E.g., air-traffic control, telegraph routing, arranging tournament or visit schedules. See [Practical application of SAT solving](https://www.carstensinz.de/talks/RISC-2005.pdf) for more. ## Bounded model checking Alloy is one example of model checking *designs*, but there are also approaches tied more closely to software *implementations*. That is, other forms of model checking run on actual code, rather than requiring a separate design specification like in Alloy. :::info Classic bounded model checking asks the question: is there a path of length `k` from an initial state to a state where a desired property does not hold? ::: Other model checkers analyze concrete implementation code. For example, the C bounded model checker (used by AWS as well as other companies). Example of CBMC in use in industry [AWS C Common Library](https://github.com/awslabs/aws-c-common/blob/main/verification/cbmc/proofs/aws_array_eq/aws_array_eq_harness.c). Next week, we'll cover more on how SAT solvers are used within other, more complex solvers called SMT solvers.